linsalrob / PhiSpy

Prediction of prophages from bacterial genomes
MIT License
70 stars 20 forks source link

ValueError: Need a Nucleotide or Protein alphabet #63

Open qianxin-kxy opened 1 year ago

qianxin-kxy commented 1 year ago

The following is the code I ran and the error situation. Has anyone encountered this issue? BioPython version is 1.77, and PhiSpy version is 4.2.21

(PhiSpy) [kxy@zju out]$ PhiSpy.py my_output.gbk -o output_directory Processing 34 contigs Making Testing Set... Start Classification Algorithm... Using the following metric(s): {'gc_skew', 'at_skew', 'shannon_slope', 'orf_length_med', 'max_direction'}. Running the random forest classifier with 500 trees and 2 threads /data/users/kxy/miniconda3/envs/PhiSpy/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of n_init will change from 10 to 'auto' in 1.4. Set the value of n_init explicitly to suppress the warning warnings.warn( As the training flag is zero, down-weighting unknown functions Evaluating... Checking prophages we might have found Potential prophages (sorted highest to lowest) Contig Start Stop Number of potential genes Status NODE_25_length 4943 22015 29 Dropped. No genes were identified as phage genes NODE_17_length 67767 91585 24 Kept NODE_31_length 48 3971 8 Dropped. No genes were identified as phage genes NODE_3_length 36724 41861 5 Dropped. No genes were identified as phage genes NODE_7_length 40892 42026 2 Dropped. Region too small (Not enough genes) NODE_9_length 147704 149860 1 Dropped. Region too small (Not enough genes) NODE_9_length 128083 130272 1 Dropped. Region too small (Not enough genes) NODE_8_length 31922 32392 1 Dropped. Region too small (Not enough genes) NODE_2_length 90215 91141 1 Dropped. Region too small (Not enough genes) NODE_27_length 8285 8959 1 Dropped. Region too small (Not enough genes) NODE_20_length 9731 10882 1 Dropped. Region too small (Not enough genes) NODE_18_length 55595 56515 1 Dropped. Region too small (Not enough genes) NODE_17_length 38600 40714 1 Dropped. Region too small (Not enough genes) NODE_16_length 41140 41724 1 Dropped. Region too small (Not enough genes) NODE_15_length 34985 35749 1 Dropped. Region too small (Not enough genes) NODE_11_length 62124 64214 1 Dropped. Region too small (Not enough genes) PROPHAGE: 1 Contig: NODE_17_length Start: 67767 Stop: 91585 Creating output files Writing GenBank output file Traceback (most recent call last): File "/data/users/kxy/miniconda3/envs/PhiSpy/bin/PhiSpy.py", line 10, in sys.exit(run()) File "/data/users/kxy/miniconda3/envs/PhiSpy/lib/python3.10/site-packages/PhiSpyModules/main.py", line 122, in run main(sys.argv) File "/data/users/kxy/miniconda3/envs/PhiSpy/lib/python3.10/site-packages/PhiSpyModules/main.py", line 114, in main PhiSpyModules.write_all_outputs(**vars(args_parser)) File "/data/users/kxy/miniconda3/envs/PhiSpy/lib/python3.10/site-packages/PhiSpyModules/writers.py", line 401, in write_all_outputs write_genbank(self) File "/data/users/kxy/miniconda3/envs/PhiSpy/lib/python3.10/site-packages/PhiSpyModules/writers.py", line 98, in write_genbank SeqIO.write(self.record, handle, 'genbank') File "/data/users/kxy/miniconda3/envs/PhiSpy/lib/python3.10/site-packages/Bio/SeqIO/init.py", line 531, in write count = writer_class(handle).write_file(sequences) File "/data/users/kxy/miniconda3/envs/PhiSpy/lib/python3.10/site-packages/Bio/SeqIO/Interfaces.py", line 235, in write_file count = self.write_records(records, maxcount) File "/data/users/kxy/miniconda3/envs/PhiSpy/lib/python3.10/site-packages/Bio/SeqIO/Interfaces.py", line 209, in write_records self.write_record(record) File "/data/users/kxy/miniconda3/envs/PhiSpy/lib/python3.10/site-packages/Bio/SeqIO/InsdcIO.py", line 1005, in write_record self._write_the_first_line(record) File "/data/users/kxy/miniconda3/envs/PhiSpy/lib/python3.10/site-packages/Bio/SeqIO/InsdcIO.py", line 757, in _write_the_first_line raise ValueError("Need a Nucleotide or Protein alphabet") ValueError: Need a Nucleotide or Protein alphabet

Additionally, because the ID in the gbk file obtained through the prokka annotation is too long, I used the following code to transform all LOCUS IDs in the file as follows:

*LOCUS NODE_2_length_354722_cov_51.4144354722 bp DNA linear sed -re 's/(_length)[^=]$/\1/' 4751.gbk > my_output.gbk LOCUS NODE_2_length**

linsalrob commented 1 year ago

Can you share the original or modified GenBank file?