Open TheBil99 opened 1 year ago
@TheBil99 have you found what the error was? I have the same issue. Example works fine but with my own input files, I get a CD-HIT error and the run aborts. The script make_rna_msa.sh contains this:
overwrite=true
if [ -f $out_dir/$out_tag.afa -a $overwrite = false] # line 9
then
exit 0
fi
the option '-a' is True if file exists. $overwrite is not a file so I'm assuming the 'overwrite' was used for testing and by default the script overwrites previous output if re-run (?!).
So I changed line 9 to:
if [[ -f $out_dir/$out_tag.afa && $overwrite = false ]]
That gets rid of the line 9: [: missing `]' error, but I still get the error with CD-HIT not finding the databases:
make_rna_msa.sh: line 101: 205786 Aborted (core dumped) cd-hit-est-2d -T $CPU -i $in_fasta -i2 trim.db -c $cut -o cdhitest2d.db -l $throw_away_sequences -M 0 &>/dev/null grep: db: No such file or directory rm: cannot remove ‘cdhitest2d.db’: No such file or directory rm: cannot remove ‘cdhitest2d.db.clstr’: No such file or directory rm: cannot remove ‘db.clstr’: No such file or directory Error: Failed to open target sequence database db for reading Alignment input open failed. couldn't open nhmmer.a2m for reading etc...
The bit that fails is this line:
cd-hit-est -T $CPU -i cdhitest2d.db -c $cut -o db -l $throw_away_sequences -M 0 &> /dev/null
nhits=`grep '^>' db | wc -l` # db output is missing!
It looks like cd-hit-est-2d failed to output cdhitest2d.db and then it fails to output db . My RNA sequence is 76 bases long, which means $throw_away_sequences=30. Maybe the 46 bases remaining somehow don't align with anything in trim.db? Hard to believe, my RNA sequences is a tRNA from Pseudomonas, it should get plenty of hits in GenBank?!
After days of installing and downloading > 2.5TB of data, I was really hoping to get this to work. Grateful for any hints.
The CD-HIT error turns out to be a known issue: https://github.com/weizhongli/cdhit/issues/26
This worked for me: uninstall the cd-hit conda package (and wait...wait some more... takes forever to complete), build from source with MAX_SEQ=10000000 and add the bin dir to my PATH: cd-hit-est-2d now runs fine.
@bifxcore Thanks for sharing! I was wondering how many MSA of RNA have been retrieved generally in your cases. In my data, there is hardly any MSA found for RNA or only a few MSAs at most.
@huangtinglin my MSA contained over 6K RNA sequences. Note that I filtered out all those that contained non-standard bases, to avoid the msa parser AssertionError (see https://github.com/uw-ipd/RoseTTAFold2NA/issues/27).
@huangtinglin my MSA contained over 6K RNA sequences. Note that I filtered out all those that contained non-standard bases, to avoid the msa parser AssertionError (see #27).
Interesting...I have no idea why my RNA sequences can only get a few MSAs. Could you provide an example of the RNA sequence you use? Thanks!
Hi RoseTTAFold2NA team, I have a similar question, could you please point to the code module where the protein-RNA binding/docking is implemented and what parameters, both output and input if any, determine the binding strength? Thank you.
Hi,. I meet the same question, and did not know how to fix it. Do you have deal with it? Should I re-download the dataset or reset the environment?
Hi, thanks for the amazing work!!
I am having some issues when trying to run the model for a protein-RNA docking. I first tried with the protein and RNA sequences you provided for the example and everything worked very well. But when trying with other sequences I get some errors.
The stderr files relative to the hhsearch and the protein msa do not contain errors. This is the content of the stderr file for the rna msa.
/aplic/noarch/software/RoseTTAFold2NA/0.2-Miniconda3-4.9.2/input_prep/make_rna_msa.sh: line 9: [: missing `]' rm: cannot remove ‘rfam1.list.split.’: No such file or directory rm: cannot remove ‘rfam2.list.split.’: No such file or directory rm: cannot remove ‘blastn1.list.split.*’: No such file or directory /aplic/noarch/software/RoseTTAFold2NA/0.2-Miniconda3-4.9.2/input_prep/make_rna_msa.sh: line 101: 221255 Aborted (core dumped) cd-hit-est-2d -T $CPU -i $in_fasta -i2 trim.db -c $cut -o cdhitest2d.db -l $throw_away_sequences -M 0 &>/dev/null grep: db: No such file or directory rm: cannot remove ‘cdhitest2d.db’: No such file or directory rm: cannot remove ‘cdhitest2d.db.clstr’: No such file or directory rm: cannot remove ‘db.clstr’: No such file or directory
Error: Failed to open target sequence database db for reading
Alignment input open failed. couldn't open nhmmer.a2m for reading
18:08:17.021 INFO: Input file = guide_RNA.wquery.unfilt.afa
18:08:17.022 INFO: Output file = guide_RNA.afa
18:08:17.022 ERROR: In /opt/conda/conda-bld/hhsuite_1659427602200/work/src/hhalignment.cpp:502: Read:
18:08:17.022 ERROR: No sequences found in file guide_RNA.wquery.unfilt.afa
grep: guide_RNA.afa: No such file or directory
Error: Failed to open target sequence database db for reading
Alignment input open failed. couldn't open nhmmer.a2m for reading
18:08:17.230 INFO: Input file = guide_RNA.wquery.unfilt.afa
18:08:17.230 INFO: Output file = guide_RNA.afa
18:08:17.230 ERROR: In /opt/conda/conda-bld/hhsuite_1659427602200/work/src/hhalignment.cpp:502: Read:
18:08:17.230 ERROR: No sequences found in file guide_RNA.wquery.unfilt.afa
Error: Failed to open target sequence database db for reading
Alignment input open failed. couldn't open nhmmer.a2m for reading
18:08:17.433 INFO: Input file = guide_RNA.wquery.unfilt.afa
18:08:17.433 INFO: Output file = guide_RNA.afa
18:08:17.434 ERROR: In /opt/conda/conda-bld/hhsuite_1659427602200/work/src/hhalignment.cpp:502: Read:
18:08:17.434 ERROR: No sequences found in file guide_RNA.wquery.unfilt.afa
Error: Failed to open target sequence database db for reading
Alignment input open failed. couldn't open nhmmer.a2m for reading
18:08:17.637 INFO: Input file = guide_RNA.wquery.unfilt.afa
18:08:17.637 INFO: Output file = guide_RNA.afa
18:08:17.638 ERROR: In /opt/conda/conda-bld/hhsuite_1659427602200/work/src/hhalignment.cpp:502: Read:
18:08:17.638 ERROR: No sequences found in file guide_RNA.wquery.unfilt.afa
Error: Failed to open target sequence database db for reading
Alignment input open failed. couldn't open nhmmer.a2m for reading
18:08:17.840 INFO: Input file = guide_RNA.wquery.unfilt.afa
18:08:17.840 INFO: Output file = guide_RNA.afa
18:08:17.841 ERROR: In /opt/conda/conda-bld/hhsuite_1659427602200/work/src/hhalignment.cpp:502: Read:
18:08:17.841 ERROR: No sequences found in file guide_RNA.wquery.unfilt.afa
Error: Failed to open target sequence database db for reading
Alignment input open failed. couldn't open nhmmer.a2m for reading
18:08:18.044 INFO: Input file = guide_RNA.wquery.unfilt.afa
18:08:18.044 INFO: Output file = guide_RNA.afa
18:08:18.045 ERROR: In /opt/conda/conda-bld/hhsuite_1659427602200/work/src/hhalignment.cpp:502: Read:
18:08:18.045 ERROR: No sequences found in file guide_RNA.wquery.unfilt.afa
rm: cannot remove ‘nhmmer.a2m’: No such file or directory
P.S.: I get the same issue also when trying to use this RNA sequence with the protein you provided in the example, but I really do not understand what is wrong with the RNA sequence I am using.
Thanks!!