uw-ipd / RoseTTAFold2NA

RoseTTAFold2 protein/nucleic acid complex prediction
MIT License
310 stars 69 forks source link

Assertion Error #57

Open akashbahai opened 1 year ago

akashbahai commented 1 year ago

Hi, I am trying to predict a few RNA structures. This tool is running fine for some targets, but it's running into some kind of error for other targets. The error is as follows:

Traceback (most recent call last): File "/home/project/12003580/RoseTTAFold2NA/network/predict.py", line 374, in pred.predict(inputs=args.inputs, out_prefix=args.prefix, ffdb=ffdb) File "/home/project/12003580/RoseTTAFold2NA/network/predict.py", line 160, in predict msa_i, ins_i = parse_fasta(a3m_i, rna_alphabet=is_rna, dna_alphabet=is_dna) File "/data/projects/12003580/RoseTTAFold2NA/network/parsers.py", line 119, in parse_fasta assert (np.all(msa<=31)) AssertionError

The RNA sequence:

8D9L_3|Chain C|Lys-tRNA|Homo sapiens (9606) GCCCGGAUAGCUCAGUCGGUAGAGCAUCAGACUUUUAAUCUGAGGGUCCAGGGUUCAAGUCCCUGUUCGGGCG

Can you please provide me with some pointers on how to resolve this assertion error?

Best, Akash

akashbahai commented 1 year ago

I checked the code for the previous version and it seems that previous parsers.py didn't have this assert condition. The new version (paersers.py, line 119) will cause an assertion error if the length of the MSA is <=31. Is that intentional?

bifxcore commented 12 months ago

@akashbahai Have you checked the sequences in your MSA? I got that error because my MSA contained RNA sequences with non-standard bases, see https://github.com/uw-ipd/RoseTTAFold2NA/issues/27

akashbahai commented 12 months ago

@bifxcore Thanks for your comment. You were right about the presence of non-standard bases. Removing them from the alignment does seem to fix the issue. Interestingly, the previous version used to run just fine for the same targets.

akashbahai commented 7 months ago

If someone is still facing the same issue, they can look at this pull request pull request. I have added a script to cleanup the MSA, if it contains non-standard bases.