linsalrob / PhiSpy

Prediction of prophages from bacterial genomes
MIT License
70 stars 20 forks source link

Error during processing, no result output - TypeError: expected str, bytes or os.PathLike object, not StringIO #67

Open naturepoker opened 6 months ago

naturepoker commented 6 months ago

Hello,

PhisPy process errors out and doesn't produce results file - confirmed same behavior on both separate venv & pip installation and conda environment installation.

Attaching error message (this one specifically from venv installation:

Running HMM profiles against /home/vog/VOGs.hmms
hmmsearch: writing the amino acids to temporary file /home/tmpvsaq1s52
/home/venv/phispy/lib/python3.10/site-packages/Bio/SeqFeature.py:230: BiopythonDeprecationWarning: Please use .location.strand rather than .strand
  warnings.warn(
Searching 4210 proteins with hmmsearch.
Traceback (most recent call last):
  File "/home/venv/phispy/lib/python3.10/site-packages/Bio/File.py", line 72, in as_handle
    with open(handleish, mode, **kwargs) as fp:
TypeError: expected str, bytes or os.PathLike object, not StringIO

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/venv/phispy/bin/phispy", line 33, in <module>
    sys.exit(load_entry_point('PhiSpy==4.2.19', 'console_scripts', 'phispy')())
  File "/home/venv/phispy/lib/python3.10/site-packages/PhiSpyModules/main.py", line 122, in run
    main(sys.argv)
  File "/home/venv/phispy/lib/python3.10/site-packages/PhiSpyModules/main.py", line 63, in main
    args_parser.record = PhiSpyModules.search_phmms(**vars(args_parser))
  File "/home/venv/phispy/lib/python3.10/site-packages/PhiSpyModules/search_phmms.py", line 70, in search_phmms
    for res in results:
  File "/home/venv/phispy/lib/python3.10/site-packages/Bio/SearchIO/__init__.py", line 302, in parse
    yield from generator
  File "/home/venv/phispy/lib/python3.10/site-packages/Bio/SearchIO/HmmerIO/hmmer3_text.py", line 46, in __iter__
    yield from self._parse_qresult()
  File "/home/venv/phispy/lib/python3.10/site-packages/Bio/SearchIO/HmmerIO/hmmer3_text.py", line 143, in _parse_qresult
    qresult = QueryResult(id=qid, hits=hit_list)
  File "/home/venv/phispy/lib/python3.10/site-packages/Bio/SearchIO/_model/query.py", line 205, in __init__
    self.append(hit)
  File "/home/venv/phispy/lib/python3.10/site-packages/Bio/SearchIO/_model/query.py", line 468, in append
    raise ValueError(
ValueError: The ID or alternative IDs of Hit 'GenomeName' exists in this QueryResult.

My old lab note shows the same version of PhiSpy (4.2.21) working without issues around 2022. What could have changed since then?

Please feel free to let me know if you need additional data/assistance (alas I'm no good with python). Thank you!

naturepoker commented 6 months ago

Quick follow up:

I tested out fresh PhiSpy installations across two separate machines running the same OS (Ubuntu 22.04 LTS), running same commands on conda-installed PhiSpy environments.

Curiously, the command worked perfectly fine on one of the machines, and didn't on the other. Here's the PhiSpy command used:

phispy  -o GCF_AccessionNumber_Species_Taxon --color --phmms ~/phrogs.hmm --threads 2 GCF_AccessionNumber_Species_Taxon.gbk

Base python version (OS) across the two machines are same: 3.10.12 PhiSpy environment (conda) python version across the two machines are the same as well: 3.10.13

However, base conda python version across both machines are difference - Conda base python version for the machine with successful run: 3.11.5 Conda base python version for the machine with failed run: 3.12.1

Another note - even on the machine that completed a successful run, trying to run a looped command over a directory of genbank files like:

for f in *.gbk; do n=$(basename $f .gbk); phispy -o $n --color --phmms ~/phrogs.hmm --threads 2 $f; done

Results in failure even when addressing the same file. Overall whether PhiSpy is going to work on a given file on a given machine or not had been pretty unpredictable...

Editing to add that the issue is definitely with result writing portion of the PhiSpy process. After trimming input file names I get terminal stdout results indicating the pipeline has run normally, but the process terminates at the writing step with below error:

Creating output files
Traceback (most recent call last):
  File "/home/miniconda3/envs/phispy/bin/phispy", line 10, in <module>
    sys.exit(run())
  File "/home/miniconda3/envs/phispy/lib/python3.10/site-packages/PhiSpyModules/main.py", line 122, in run
    main(sys.argv)
  File "/home/miniconda3/envs/phispy/lib/python3.10/site-packages/PhiSpyModules/main.py", line 114, in main
    PhiSpyModules.write_all_outputs(**vars(args_parser))
  File "/home/miniconda3/envs/phispy/lib/python3.10/site-packages/PhiSpyModules/writers.py", line 325, in write_all_outputs
    self.record.get_entry(self.pp[i]['contig']).append_feature(SeqFeature(
TypeError: SeqFeature.__init__() got an unexpected keyword argument 'strand'

At the end the tmp directory with data is deleted (in some cases, not all), leaving only the log file.

jvfe commented 5 months ago

Quick follow up:

I tested out fresh PhiSpy installations across two separate machines running the same OS (Ubuntu 22.04 LTS), running same commands on conda-installed PhiSpy environments.

Curiously, the command worked perfectly fine on one of the machines, and didn't on the other. Here's the PhiSpy command used:

phispy  -o GCF_AccessionNumber_Species_Taxon --color --phmms ~/phrogs.hmm --threads 2 GCF_AccessionNumber_Species_Taxon.gbk

Base python version (OS) across the two machines are same: 3.10.12 PhiSpy environment (conda) python version across the two machines are the same as well: 3.10.13

However, base conda python version across both machines are difference - Conda base python version for the machine with successful run: 3.11.5 Conda base python version for the machine with failed run: 3.12.1

Another note - even on the machine that completed a successful run, trying to run a looped command over a directory of genbank files like:

for f in *.gbk; do n=$(basename $f .gbk); phispy -o $n --color --phmms ~/phrogs.hmm --threads 2 $f; done

Results in failure even when addressing the same file. Overall whether PhiSpy is going to work on a given file on a given machine or not had been pretty unpredictable...

Editing to add that the issue is definitely with result writing portion of the PhiSpy process. After trimming input file names I get terminal stdout results indicating the pipeline has run normally, but the process terminates at the writing step with below error:

Creating output files
Traceback (most recent call last):
  File "/home/miniconda3/envs/phispy/bin/phispy", line 10, in <module>
    sys.exit(run())
  File "/home/miniconda3/envs/phispy/lib/python3.10/site-packages/PhiSpyModules/main.py", line 122, in run
    main(sys.argv)
  File "/home/miniconda3/envs/phispy/lib/python3.10/site-packages/PhiSpyModules/main.py", line 114, in main
    PhiSpyModules.write_all_outputs(**vars(args_parser))
  File "/home/miniconda3/envs/phispy/lib/python3.10/site-packages/PhiSpyModules/writers.py", line 325, in write_all_outputs
    self.record.get_entry(self.pp[i]['contig']).append_feature(SeqFeature(
TypeError: SeqFeature.__init__() got an unexpected keyword argument 'strand'

At the end the tmp directory with data is deleted (in some cases, not all), leaving only the log file.

Hey there, I was experiencing the same issue you had on your last run (TypeError: SeqFeature.__init__() got an unexpected keyword argument 'strand') in my conda environment with PhiSpy. It looks like it's an issue with BioPython version 1.82 (https://github.com/biopython/biopython/issues/4563) so after downgrading to 1.81 it ran smoothly.