linsalrob / PhiSpy

Prediction of prophages from bacterial genomes
MIT License
70 stars 21 forks source link

ValueError: missing molecule_type in annotations #41

Closed tauqeer9 closed 4 years ago

tauqeer9 commented 4 years ago

Hi I get following error with --output_choice 4 or --output_choice 8. I don't get bacteria.fasta, bacteria.gbk and phage.gbk. I do get phage.fasta. What could be the possible reason? All other options work fine.

PhiSpy.py Streptococcus_pyogenes_M1_GAS.gbk -o M.phages -p M1 --threads 4 --log M1.log --output_choice 4

ValueError: missing molecule_type in annotations

Thank you so much.

pdec commented 4 years ago

Hi,

thanks for using PhiSpy! Can you tell us with which version of PhiSpy you got this error?

simply run PhiSpy.py --version

Also, can you provide the whole error message or a log file? Is the Streptococcus_pyogenes_M1_GAS.gbk the file we're providing in tests directory? Do you get this error with any other GenBank file?

In case of --output_choice 8 you should only get prophage_information.tsv file based on the code table. While checking that I found a typo in code specifically for --output_choice 8 that is fixed in version 4.2.4 on master branch.

Thanks, Przemek

tauqeer9 commented 4 years ago

Thank you so much.

$ PhiSpy.py --version 4.1.22

$ PhiSpy.py Streptococcus_pyogenes_M1_GAS.gbk -o M1.phages -p M1 --threads 4 --log M1_choice4.log --output_choice 4

Processing 1 contigs Making Testing Set... Start Classification Algorithm... Using the following metric(s): {'at_skew', 'gc_skew', 'orf_length_med', 'shannon_slope', 'max_direction'}. Running the random forest classifier with 500 trees and 4 threads As the training flag is zero, down-weighting unknown functions Evaluating... Checking prophages we might have found Potential prophages (sorted highest to lowest) Contig Start Stop Number of potential genes Status NC_002737 778642 820599 54 Kept NC_002737 1191309 1222549 47 Kept NC_002737 529631 569288 45 Kept NC_002737 1770150 1785658 22 Kept NC_002737 892723 893805 4 Dropped. Not enough genes NC_002737 176004 177861 2 Dropped. Not enough genes NC_002737 1808123 1810396 1 Dropped. Not enough genes NC_002737 1665186 1665887 1 Dropped. Not enough genes NC_002737 1544661 1545035 1 Dropped. Not enough genes NC_002737 980732 981802 1 Dropped. Not enough genes NC_002737 711922 712041 1 Dropped. Not enough genes NC_002737 449392 449844 1 Dropped. Not enough genes NC_002737 361432 362130 1 Dropped. Not enough genes NC_002737 315251 315997 1 Dropped. Not enough genes NC_002737 190526 191230 1 Dropped. Not enough genes NC_002737 49621 51264 1 Dropped. Not enough genes PROPHAGE: 1 Contig: NC_002737 Start: 529631 Stop: 569288 PROPHAGE: 2 Contig: NC_002737 Start: 778642 Stop: 820599 PROPHAGE: 3 Contig: NC_002737 Start: 1191309 Stop: 1222549 There were 3 repeats with the same length as the best. One chosen somewhat randomly! PROPHAGE: 4 Contig: NC_002737 Start: 1770150 Stop: 1785658 Creating output files Writing bacterial and phage DNA as fasta Traceback (most recent call last): File "/opt/anaconda3/bin/PhiSpy.py", line 125, in main(sys.argv) File "/opt/anaconda3/bin/PhiSpy.py", line 117, in main PhiSpyModules.write_all_outputs(**vars(args_parser)) File "/opt/anaconda3/lib/python3.8/site-packages/PhiSpyModules/writers.py", line 361, in write_all_outputs write_phage_and_bact(self) File "/opt/anaconda3/lib/python3.8/site-packages/PhiSpyModules/writers.py", line 151, in write_phage_and_bact SeqIO.write(pp_gbk, phage_genbank, "genbank") File "/opt/anaconda3/lib/python3.8/site-packages/Bio/SeqIO/init.py", line 530, in write count = writer_class(handle).write_file(sequences) File "/opt/anaconda3/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py", line 244, in write_file count = self.write_records(records, maxcount) File "/opt/anaconda3/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py", line 218, in write_records self.write_record(record) File "/opt/anaconda3/lib/python3.8/site-packages/Bio/SeqIO/InsdcIO.py", line 981, in write_record self._write_the_first_line(record) File "/opt/anaconda3/lib/python3.8/site-packages/Bio/SeqIO/InsdcIO.py", line 744, in _write_the_first_line raise ValueError("missing molecule_type in annotations") ValueError: missing molecule_type in annotations

Following two examples also give error-

PhiSpy.py CP015626.gbk -o CP015626.phages -p CP015626 --threads 4 --log CP015626_choice4.log --output_choice 4 PhiSpy.py CP016072.gbk -o CP016072.phages -p CP016072 --threads 4 --log CP016072_choice4.log --output_choice 4

--output_choice 4 : does not work as it says, only generates M1_prophage.fasta, other 3 files are empty --output_choice 8 : does not work as it says, only generates M1_prophage.fasta, other 3 files are empty --output_choice 11 : Interestingly, it works and generates M1_prophage_coordinates.tsv, M1_prophage_information.tsv and M1_Streptococcus_pyogenes_M1_GAS.gbk

pdec commented 4 years ago

Hey,

thanks for the note!

I could reproduce your error after updating Biopython to v1.78. Starting from this version the Bio.Alphabet is removed and requires some changes in the code. More about that here.

To avoid your error you can either switch to previous Biopython version (eg. conda install biopython=1.77) or use the newest PhiSpy version v4.2.5.

I recommend the second option as we also fixed the --output_choice 8. In version 4.1.22 code 8 works as code 7 due to ">" instead of ">=" typo. Note that if you want several different output files you must add code numbers, e.g. code 11 will provide the output of codes 8, 2 and 1.

Let us know whether it fixes your error.

Przemek

tauqeer9 commented 4 years ago

Thank you very much. It is working perfectly fine now. Installed the latest version using git. I will eventually install through bioconda when latest version is available.

Thanks again for fixing those errors.

Tauqeer

pdec commented 4 years ago

That's great! Thanks for letting us know about the error.

Przemek