uw-ipd / RoseTTAFold2NA

RoseTTAFold2 protein/nucleic acid complex prediction
MIT License
310 stars 69 forks source link

A bug in make_rna_msa.sh #19

Open HeJonghong opened 1 year ago

HeJonghong commented 1 year ago

blastdbcmd -entry_batch should begin with the sequence ID,when using accession as input will return wrong series. For excample:

$ printf "%s %s %s \n" 1M5O_B -4-98 minus 30 | blastdbcmd -db $db -entry_batch -
$ 6B6H_1 Chain 1, SYNTHETIC NONTEMPLATE STRAND DNA (88-MER)
CGCCGCGTCAGACTGCACACATTATAGCATACGTGAGGTGGGATGTCAAGGCCTTTTTTGCCTAAAATGTGATCTAGATC
ACATTTTN
Error: [blastdbcmd] Skipped 30

The input was 1M5O_B but 6B6H_1 was return,It is recommended to modify blastn -outfmt to '6 sgi smart send saver evalue bitscore nident staxids'

fdimaio commented 1 year ago

Hello, I do not see this locally (with blast 2.12.0+):

$ printf "%s %s %s \n" 1M5O_B -4-98 minus 30 | blastdbcmd -db /home/dimaio/RoseTTAFold2NA/RNA/nt -entry_batch -
>1M5O_B Chain B, RNA HAIRPIN RIBOZYME >1M5O_E Chain E, RNA HAIRPIN RIBOZYME >1M5P_B Chain B, RNA HAIRPIN RIBOZYME >1M5P_E Chain E, RNA HAIRPIN RIBOZYME >1M5V_B Chain B, RNA HAIRPIN RIBOZYME >1M5V_E Chain E, RNA HAIRPIN RIBOZYME
CGGCCACCACGAAGTTTCCCCCGTACCAGGTAATATACCACCAACCCGGAGTGCAATGGGTTGGTGTGTTTCTCTGGTTG
ACTTCTCTCTCC

Changing -outfmt to '6 sgi smart send saver evalue bitscore nident staxids' gives me identical results. I can make this change but I'm wondering the reason for the difference?

HeJonghong commented 1 year ago

The reason may be the version of blast (with 2.13.0+)

HeJonghong commented 1 year ago

there was a new situation,the results from blastn will just have accssion but no gi sometimes,and the sequence cannot be obtained by blastbcmd and accssion returned. For excample:

$ blastdbcmd -db $db1 -entry URS0000D77956_562
> Error: [blastdbcmd] DB contains no accession info.

URS0000D77956_562 was the accssion from blastn.

Could you please help me out? thanks a lot.