raw-lab / MetaCerberus

Python code for versatile Functional Ontology Assignments for Metagenomes searching via Hidden Markov Model (HMM) with environmental focus of shotgun metaomics data
BSD 3-Clause "New" or "Revised" License
48 stars 7 forks source link

Bug: reformatting fastq's to fasta's fails #5

Closed kclambi1 closed 7 months ago

kclambi1 commented 1 year ago

Step 5b fails when attempting to index fasq files. Full traceback:

Traceback (most recent call last): File "/Users/xxxx/mambaforge/envs/metacerberus/bin/metacerberus.py", line 693, in sys.exit(main()) File "/Users/xxxx/mambaforge/envs/metacerberus/bin/metacerberus.py", line 415, in main key,value,func = ray.get(ready[0]) File "/Users/xxxx/mambaforge/envs/metacerberus/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, *kwargs) File "/Users/xxxx/mambaforge/envs/metacerberus/lib/python3.10/site-packages/ray/_private/worker.py", line 2380, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(TypeError): ray::rayWorkerThread() (pid=6687, ip=127.0.0.1) File "/Users/xxxx/mambaforge/envs/metacerberus/bin/metacerberus.py", line 119, in rayWorkerThread ret = func(params) File "/Users/xxxx/mambaforge/envs/metacerberus/lib/python3.10/site-packages/meta_cerberus/metacerberus_qc.py", line 15, in checkQuality return checkPairedRead(rawRead, config, subdir) File "/Users/xxxx/mambaforge/envs/metacerberus/lib/python3.10/site-packages/meta_cerberus/metacerberus_qc.py", line 39, in checkPairedRead command = f"{config['EXE_FASTQC']} -o {path} {pairedRead[0]} {pairedRead[1]}" TypeError: 'PosixPath' object is not subscriptable

Seems to be looking for forward and reverse paired read files even when '--nanopore' is specified in the parameter options?

raw-lab commented 1 year ago

Thank you for using MetaCerberus. To help you here. Can you tell me your OS, version of MetaCerberus your using, input data format (Illumina, PacBio, Oxford), and data input (reads, contigs, proteins)?

We only look at forward for PacBio and Oxford. There are no reverse ends.

kclambi1 commented 1 year ago

OS: MacOS Ventura 13.5.1 Metacerberus is the latest github release.
Input is oxford nanopore fastq reads, so it is indeed odd that even specifying '--nanopore' the error seems to pertain to paired reads.

Command ran: metacerberus.py --fraggenescan /Users/xxxx/Downloads/fastq_pass/ --nanopore --meta --dir_out /Users/xxxx/Downloads/lambda_vir-only_dir/

pck00 commented 9 months ago

I am getting the exact same error, at 5b with nanopore (SRR15179650) and 4 with illumina (SRR22410997) The command I'm running is metacerberus.py --prodigal ~/metacerberus/SRRetc --illumina (or nanopore) --meta --dir_out test

edit: Although I was lazy and direct downloaded the files from SRA, so they are not paired. Will go through sra-toolkit download and report back

raw937 commented 9 months ago

Are they Illumina or nanopore data? What version of MetaCerberus are you using? Tell me a bit more about the sample (viral, bacteria, euk)? Let me know?

pck00 commented 9 months ago

Presumably the current one, got it via git clone https://github.com/raw-lab/metacerberus.git Both are randomly picked viral metagenomes, one is illumina one is nanopore - both give me the error.

raw937 commented 9 months ago

please install the via mamba. See if you get the error. Also, please give me the version of MetaCerberus you are using.

raw937 commented 9 months ago

Please also provide the complete error message. From git but try the mamba install first.

raw937 commented 9 months ago

Are you on a linux, windows, mac?

raw-lab commented 9 months ago

The direct from github install isn't working without mamba. If you have Mac or Linux use the mamba install.

conda install mamba mamba create -n metacerberus -c bioconda -c conda-forge metacerberus conda activate metacerberus metacerberus.py --setup

If you have a certain type of mac: use this (OSX-ARM (M1/M2)) conda create -y -n metacerberus conda activate metacerberus conda config --env --set subdir osx-64 conda install -y -c conda-forge mamba python=3.10 "pydantic<2" mamba install -y -c bioconda -c conda-forge metacerberus metacerberus.py --setup

Unfortunately the issue isn't a MetaCerberus issue it's a Mac conda license issue. On some Macs.

Give this a try. If this doesn't work. Please provide this information: 1) Your CPU distribution/set-up. For example, from your terminal.

command from terminal for linux

lsb_release -a Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy

command from terminal for Mac

sw_vers ProductName: Mac OS X ProductVersion: 10.14.6 BuildVersion: yadda yadda yadda

2) Please write the complete error message and the data you are using. I would recommend the test data in data folder to start.

Thank you for using MetaCerberus. I am going to close this for now. If you have issues with this. Provide this information. We have removed the git based install option off the readme for now while we debug. Please use the mamba option it is fast and should work for Mac and Linux.

many thanks, RAW Lab

ebueren commented 9 months ago

Hello, I also appear to be having a similar issue at step 4b (decontamination) on my university's HPC, using slurm. LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 7.9 (Maipo) Release: 7.9 Codename: Maipo

I installed using mamba as described above. I've tried using the below command with both my own paired illumina reads as well as the test reads in the github example data, both result in the same error.

Any ideas? Thanks!

Command: metacerberus.py --super input_test/ --illumina --meta --dir_out test.out

Log read out:


STEP 4: Decontaminating trimmed files
ESC[2mESC[36m(rayWorkerThread pid=103308)ESC[0m Command '/home/ebueren/miniconda3/envs/metacerberus/bin/bbduk.sh -Xmx1g in=/projects/mcbee/eb2/bee_virome/07
_func/mega/test.out/step_03-trim/prodigal_Test_R1/trimmed_prodigal_Test_R1.fastq out=/projects/mcbee/eb2/bee_virome/07_func/mega/test.out/step_04-decontamin
ate/prodigal_Test_R1/decon-prodigal_Test_R1.fastq qin=30 qtrim=r minlen=50 outm=/projects/mcbee/eb2/bee_virome/07_func/mega/test.out/step_04-decontaminate/p
rodigal_Test_R1/matched_prodigal_Test_R1 ref=default k=31 stats=/projects/mcbee/eb2/bee_virome/07_func/mega/test.out/step_04-decontaminate/prodigal_Test_R1/
stats.txt' returned non-zero exit status 1.
ESC[2mESC[36m(rayWorkerThread pid=103308)ESC[0m ERROR: Failed to execute:
ESC[2mESC[36m(rayWorkerThread pid=103308)ESC[0m  /home/ebueren/miniconda3/envs/metacerberus/bin/bbduk.sh -Xmx1g in=/projects/mcbee/eb2/bee_virome/07_func/me
ga/test.out/step_03-trim/prodigal_Test_R1/trimmed_prodigal_Test_R1.fastq out=/projects/mcbee/eb2/bee_virome/07_func/mega/test.out/step_04-decontaminate/prod
igal_Test_R1/decon-prodigal_Test_R1.fastq qin=30 qtrim=r minlen=50 outm=/projects/mcbee/eb2/bee_virome/07_func/mega/test.out/step_04-decontaminate/prodigal_
Test_R1/matched_prodigal_Test_R1 ref=default k=31 stats=/projects/mcbee/eb2/bee_virome/07_func/mega/test.out/step_04-decontaminate/prodigal_Test_R1/stats.tx
t
======================================================
End Time   : Mon Jan 22 17:56:40 EST 2024
======================================================

And Error message:

Traceback (most recent call last):
  File "/home/ebueren/miniconda3/envs/metacerberus/bin/metacerberus.py", line 693, in <module>
    sys.exit(main())
  File "/home/ebueren/miniconda3/envs/metacerberus/bin/metacerberus.py", line 415, in main
    key,value,func = ray.get(ready[0])
  File "/home/ebueren/miniconda3/envs/metacerberus/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/ebueren/miniconda3/envs/metacerberus/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/home/ebueren/miniconda3/envs/metacerberus/lib/python3.10/site-packages/ray/_private/worker.py", line 2524, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TypeError): ESC[36mray::rayWorkerThread()ESC[39m (pid=103308, ip=10.128.8.66)
  File "/home/ebueren/miniconda3/envs/metacerberus/bin/metacerberus.py", line 119, in rayWorkerThread
    ret = func(*params)
  File "/home/ebueren/miniconda3/envs/metacerberus/lib/python3.10/site-packages/meta_cerberus/metacerberus_qc.py", line 15, in checkQuality
    return checkPairedRead(rawRead, config, subdir)
  File "/home/ebueren/miniconda3/envs/metacerberus/lib/python3.10/site-packages/meta_cerberus/metacerberus_qc.py", line 39, in checkPairedRead
    command = f"{config['EXE_FASTQC']} -o {path} {pairedRead[0]} {pairedRead[1]}"
TypeError: 'PosixPath' object is not subscriptable
raw-lab commented 9 months ago

I have tested it. We also get the decon bug. I am reopening it. We will have a solution shortly.

raw-lab commented 9 months ago

Genome contigs should work. It's a fastq issue which we will fix shortly.

raw-lab commented 8 months ago

Good evening,

There was a bug in the fastq processing. Please update to 1.2. The bug is now fixed for fastq files. I going to close this for now. Let us know if you still have issues.

many thanks, RAW lab

ebueren commented 8 months ago

Hi, sorry to report I am still having some bugs! The update seems to have solved the initial issue I had with step4b, but I am now getting the issue reported with step 5b during reformatting.

Any ideas? Assembled contigs are running smoothly.

Command: metacerberus.py --super input_test/ --illumina --meta --dir_out test.out

STEP 5b: Reformating FASTQ files to FASTA format ESC[2mESC[36m(rayWorkerThread pid=221132, ip=10.128.8.170)ESC[0m Command '['/home/ebueren/miniconda3/envs/metacerberus/bin/bbduk.sh', '-Xmx1g', 'in=/pr ojects/mcbee/eb2/bee_virome/07_func/mega/test.out/step_03-trim/prodigal_Test/trimmed_prodigal_Test.fastq', 'out=/projects/mcbee/eb2/bee_virome/07_func/ mega/test.out/step_04-decontaminate/prodigal_Test/decon-prodigal_Test.fastq', 'qin=30', 'qtrim=r', 'minlen=50', 'k=31', 'ref=default', 'hdist=1', 'stat s=/projects/mcbee/eb2/bee_virome/07_func/mega/test.out/step_04-decontaminate/prodigal_Test/stats.txt']' returned non-zero exit status 1. ESC[2mESC[36m(rayWorkerThread pid=221132, ip=10.128.8.170)ESC[0m ERROR: Failed to execute: ESC[2mESC[36m(rayWorkerThread pid=221132, ip=10.128.8.170)ESC[0m ['/home/ebueren/miniconda3/envs/metacerberus/bin/bbduk.sh', '-Xmx1g', 'in=/projects/m cbee/eb2/bee_virome/07_func/mega/test.out/step_03-trim/prodigal_Test/trimmed_prodigal_Test.fastq', 'out=/projects/mcbee/eb2/bee_virome/07_func/mega/tes t.out/step_04-decontaminate/prodigal_Test/decon-prodigal_Test.fastq', 'qin=30', 'qtrim=r', 'minlen=50', 'k=31', 'ref=default', 'hdist=1', 'stats=/proje cts/mcbee/eb2/bee_virome/07_func/mega/test.out/step_04-decontaminate/prodigal_Test/stats.txt'] ESC[2mESC[36m(rayWorkerThread pid=221132, ip=10.128.8.170)ESC[0m Command '['/home/ebueren/miniconda3/envs/metacerberus/bin/bbduk.sh', '-Xmx1g', 'in=/pr ojects/mcbee/eb2/bee_virome/07_func/mega/test.out/step_03-trim/FragGeneScan_Test/trimmed_FragGeneScan_Test.fastq', 'out=/projects/mcbee/eb2/bee_virome/ 07_func/mega/test.out/step_04-decontaminate/FragGeneScan_Test/decon-FragGeneScan_Test.fastq', 'qin=30', 'qtrim=r', 'minlen=50', 'k=31', 'ref=default', 'hdist=1', 'stats=/projects/mcbee/eb2/bee_virome/07_func/mega/test.out/step_04-decontaminate/FragGeneScan_Test/stats.txt']' returned non-zero exit stat us 1. ESC[2mESC[36m(rayWorkerThread pid=221132, ip=10.128.8.170)ESC[0m ERROR: Failed to execute: ESC[2mESC[36m(rayWorkerThread pid=221132, ip=10.128.8.170)ESC[0m ['/home/ebueren/miniconda3/envs/metacerberus/bin/bbduk.sh', '-Xmx1g', 'in=/projects/m cbee/eb2/bee_virome/07_func/mega/test.out/step_03-trim/FragGeneScan_Test/trimmed_FragGeneScan_Test.fastq', 'out=/projects/mcbee/eb2/bee_virome/07_func/ mega/test.out/step_04-decontaminate/FragGeneScan_Test/decon-FragGeneScan_Test.fastq', 'qin=30', 'qtrim=r', 'minlen=50', 'k=31', 'ref=default', 'hdist=1 ', 'stats=/projects/mcbee/eb2/bee_virome/07_func/mega/test.out/step_04-decontaminate/FragGeneScan_Test/stats.txt'] STEP 7: ORF Finder

raw-lab commented 8 months ago

We need to know some details. Version of metacerberus: metacerberus.py -v (will print version output) CPU Version/OS: Mac, Linux etc

Attach a small amount of data here in fastq. And, we can check the format.

raw-lab commented 8 months ago

Odd. I am not seeing this error in 1.2 version. Have you updated to 1.2? Maybe it is the file. Send us the file and we can see.

raw-lab commented 8 months ago

Are you still in linux redhat?

ebueren commented 8 months ago

Hi, yes still in Redhat, LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 7.9 (Maipo) Release: 7.9 Codename: Maipo

I did a fresh conda env install, and version seems to be correct: MetaCerberus: version: 1.2 September 2023

I'm using Test_R1.fastq and Test_R2.fastq from the metacerberus github data/example_data , but I have also had the same issue with my own .fq files. It's possible this is an issue with our HPC so I will also give it a try on my personal computer as soon as I get the chance.

raw-lab commented 8 months ago

Hmm, send a small amount of your fastqs we will test them.

decrevi commented 8 months ago

Hello, I was able to replicate the bug and I believe I have tracked down the error. It has to do with giving MetaCerberus a folder with .fastq files. I am working on fixing this bug. In the meantime, try giving MetaCerberus each file individually:

metacerberus.py --super input_test/Test_R1.fastq --super input_test/Test_R2.fastq --illumina --meta --dir_out test.out

It is also easier to list input files and command line arguments in a config file.

create a configuration file, for example test.yaml:

metacerberus.py --super input_test/Test_R1.fastq --super input_test/Test_R2.fastq --illumina --meta --dir_out test.out

super: [input_test/Test_R1.fastq, input_test/Test_R2.fastq] illumina: True meta: True dir_out: test.out

The line that starts with the hashtag # is ignored, it is just to show what the equivalent command line arguments would be

And then run MetaCerberus with the command:

metacerberus.py -c test.yaml

Thank you again for your feedback, this bug will be fixed for version 1.3.

-Jose

raw-lab commented 7 months ago

This bug has been fixed in 1.2.1. Let us know if you have any issues with the update? We will close this for now.