uw-ipd / RoseTTAFold2NA

RoseTTAFold2 protein/nucleic acid complex prediction
MIT License
319 stars 71 forks source link

Unaccepted characters after Rfam-database-check #116

Open Tim15-tech opened 1 month ago

Tim15-tech commented 1 month ago

I encountered the error "assert (np.all(msa<=31))" when running the program sometimes. It seems that after checking the Rfam-database some annotations contain like an R, which is unknown within the RoseTTAFold2NA-RNAalphabet. I circumvent this problem by writing characters to an N. The code is in my forked repository in case someone encounters the same problem.

Thanks for the model!

anar-rzayev commented 1 month ago

First of all, thank you for your commits in the forked repo, they were quite helpful at least to get rid of the bad characters. I was wondering if there is any way to resolve the FASTA-Reader: Ignoring invalid residues at position(s) as I still get these issues when running your scripts.

Meanwhile, by any chance, have you encountered problems with downloading update_blastdb.pl --decompress n as it seems there are so many disconnections happening to NCBI, even with timeout 3600, passive FTP, and verbose outputs, I have hard time to download this 151GB dataset. Do you have any possible approaches to solve it I wanted to ask

Tim15-tech commented 1 month ago

Sadly, I don't know. I'm not familiar with this kind of issue, however it might simply be due to the fasta-files? If the error is in the origin, then I assume it's fine for running the program - however, I finally don't know. I remember this message - if not wrongly remembered - and I think I simply ignored it.

However, regarding the second question: I think sadly the best is to have fast internet. In example, with poor internet git clone can be a horror. In case there are disconnections from NCBI-side maybe wait some days or a week. However, in the end a stable and fast connection is required. I downloaded it a my university on a server over days using screen, which allowed me to let the command run and detach my terminal to allow shutdown of my local PC.

anar-rzayev commented 1 month ago

Actually, you are right. As long as the bad characters are gone, it shouldn't be a big deal to have those FASTA invalid warnings. About the second point, in fact, I also run the tmux session as running nohup on the background didn't help initially:

/home/intern/protein/.conda/envs/RF2NA/bin/perl /home/intern/protein/.conda/envs/RF2NA/bin/update_blastdb.pl --decompress --passive --timeout 3600 --force --verbose --verbose nt > nt_download_log.txt 2>&1

It is so strange that even with timeout of 1 hour, running the attached session disconnects after 1-2 hours. The output keeps telling that

Downloading nt.000.tar.gz...Net::FTP=GLOB(0x55b3acd21cf8)>>> PASV
Net::FTP=GLOB(0x55b3acd21cf8)<<< 227 Entering Passive Mode (130,14,250,12,196,235).
Net::FTP=GLOB(0x55b3acd21cf8)>>> RETR nt.000.tar.gz
Net::FTP=GLOB(0x55b3acd21cf8)<<< 150 Opening BINARY mode data connection for nt.000.tar.gz (4721879742 bytes)
Net::FTP: Net::Cmd::getline(): unexpected EOF on command channel:  at /home/intern/protein/.conda/envs/RF2NA/lib/perl5/core_perl/Net/FTP/dataconn.pm line 82.
Unable to close datastream at /home/intern/protein/.conda/envs/RF2NA/bin/update_blastdb.pl line 202.
Net::FTP: Net::Cmd::_is_closed(): unexpected EOF on command channel:  at /home/intern/protein/.conda/envs/RF2NA/bin/update_blastdb.pl line 203.
Net::FTP: Net::Cmd::_is_closed(): unexpected EOF on command channel:  at /home/intern/protein/.conda/envs/RF2NA/bin/update_blastdb.pl line 203.
Failed to download nt.000.tar.gz.md5!
Net::FTP: Net::Cmd::_is_closed(): unexpected EOF on command channel:  at /home/intern/protein/.conda/envs/RF2NA/bin/update_blastdb.pl line 101.
Net::FTP: Net::Cmd::_is_closed(): unexpected EOF on command channel:  at /home/intern/protein/.conda/envs/RF2NA/bin/update_blastdb.pl line 101.

Did you have any specific installations by any chance or running this perl script after maybe downloading MoreUtils or other library was sufficient?

anar-rzayev commented 1 month ago

Also, on top of the installations part, did you have any issues with downloading all the necessary packages for running RF2NA? Maybe any specific blast versions were necessary or any other specificity