When generating the tcr_seqs.json file I've run into
Traceback (most recent call last):
File "run_tcrmodel2.py", line 334, in <module>
app.run(main)
File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "run_tcrmodel2.py", line 299, in main
cdr3, seq=parse_tcr_seq.parse_anarci(anarci_out)
File "/tcrmodel2/scripts/parse_tcr_seq.py", line 23, in parse_anarci
num=int(fields[1])
ValueError: invalid literal for int() with base 10: 'Unknown'
This error was also mentioned in #2 . It seems to be caused by one of the templates containing an X amino acid, which leads to ANARCI raising
Error: Unknown amino acid letter found in sequence: X
in which case parse_anarci returns Unknown.
In my case this was the5xot_D template but there are a lot more templates with X in data/databases/pdb_seqres.txt.
I'm not sure what to do about this. The error does not seem critical as the structures were already generated at this point. This could probably be handled by adding a special case in parse_anarci so that it returns an empty list in this case.
When generating the
tcr_seqs.json
file I've run intoThis error was also mentioned in #2 . It seems to be caused by one of the templates containing an
X
amino acid, which leads to ANARCI raisingin which case
parse_anarci
returnsUnknown
.In my case this was the
5xot_D
template but there are a lot more templates withX
indata/databases/pdb_seqres.txt
.I'm not sure what to do about this. The error does not seem critical as the structures were already generated at this point. This could probably be handled by adding a special case in
parse_anarci
so that it returns an empty list in this case.