Closed ebueren closed 5 months ago
What version of MetaCerberus are you using? metacerberus.py -v
Also, how long are your reads?
I have MetaCerberus: version: 1.2.1 February 2024
**Edit, I misread your question, my bad. My individual .fq files range from ~200MB - 800MB with around 200bp for individual reads, based on a quick check of a few samples.
At first glance based on which samples made it to the end, size doesn't seem to be the reason some are being dropped vs. kept, but this is just an eyeball so I will do a little bit of a dig to check.
Did they make it to the end? Can you attach some of your reads so we can check? Are they overlapping libraries? What was the insert size?
Also, provide the command you used?
Is it only the fraggenescan that is dropping out? We think this is a bug that we are fixing in 1.3 version. Is the prodigal still calling orfs?
Command: metacerberus.py --super test_input/ --illumina --meta --dir_out test.out --cpus 16
Have also tried it with just metacerberus.py -c input.yaml with the same list of files/parameters
The reads are overlapping libraries with an insert size of 500bp.
Both fraggenescan and prodigal are dropping out at step_08.
Looking more closely at the intermediate files in step_08, it looks like the hmmer search is unsuccessful for the dropped samples.
Ex: I ran 12 samples through super, and at step_08 a program_sample folder was made for each run. The ones that were dropped at this stage (didn’t go on to step 09) only have one file — an empty tmp.err files. (ex: “KOFam_all-prodigal_12_KettleFalls_unP_tmp.err” — 0 bytes, is the only file in step_08-hmmer/prodigal_12_KettleFalls_unP)
The other weird thing is that none of my samples have been successful with both fraggenescan and prodigal. So a sample that worked for Fragenescan then did not work with prodigal as an ORF caller or vice versa.
I also noticed I'm having similar issues with annotating any assembled contigs, either made from these reads or any other files that previously worked in 1.1. I set up an env of metacerberus 1.1 and the assembled contig files run smoothly through those.
Attached reads are in a drive folder here: https://drive.google.com/file/d/1QK_8nNKdc5mBkeNcJcHUBrqJxvjb7WSc/view?usp=sharing
In v1.2. my runs have Sample 5 successfully completing with only Prodigal, Sample 8 only FragGeneScan, and Samples 6 and 9 failing both.
Hmm, it's quite odd. I took a subsample of the reads (beautiful quality scores by the way). I didn't have the same issues. We will take another look and get back to you as soon as possible. Thank you for your patience also helping us to find bugs in MetaCerberus.
What type of samples are they? Are they full of eukaryotes or protists? Or a plant genome?
We will get you up and running to results asap.
many thanks, RAW lab
They are virome reads, but they haven't been filtered except for host contamination (bees)-- so if there is a high amount of bacterial or non-viral hits that is probably why!
Hmmm, I will do a fresh install on my end and see if that fixes it using the following:
conda create -n metacerberus -c conda-forge -c bioconda metacerberus -y
conda activate metacerberus
metacerberus.py --setup
Will let you know if anything changes on my end!
If you do the metacerberus.py --setup that should be good.
We are working on 1.3 which have more databases and a major update.
Filteration of host sequences is quite challenging. We are working on another tool for that.
It should havent them drop out. Could it be a memory issue?
We will work with the data you gave us to see if we can fix it. Thank you for your patience.
Many thanks, RAW lab
So I reran it in a fresh environment in case my initial install went awry somewhere. The output I got still had drops but was different. This time FragGeneScan worked for Samples 5 and 6, and both 8 and 9 failed FragGeneScan and Prodigal.
It seems possible this might be a memory issue. This is my first time using Ray, is it possible I'm using it incorrectly? This is the sbatch I'm submitting, for context:
#!/usr/bin/env bash
#SBATCH --nodes=3
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=16
cd /projects/x/eb2/x/07_func/mega/02_metacerb_fq
source activate /home/ebueren/miniconda3/envs/metacerberusNEW
source ray-slurm-metacerberus.sh
metacerberus.py --super test_reads/ --illumina --meta --dir_out raw.out --cpus 16
I tried running with a request for more CPUs per task (below), and this time it ran faster (before getting to the hmmer step) but dropped all the runs at the hmmer stage. So there was no sucessful output.
#!/usr/bin/env bash
#SBATCH --nodes=3
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=64
cd /projects/x/eb2/x/07_func/mega/02_metacerb_fq
source activate /home/ebueren/miniconda3/envs/metacerberusNEW
source ray-slurm-metacerberus.sh
metacerberus.py --super test_reads/ --illumina --meta --dir_out raw.out --cpus 64
Good afternoon,
Thank you for you patience - 1.3 is loading to conda currently. The 1.3 version will fix this bug.
Thank you for using MetaCerberus!
many thanks, Rick
LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 7.9 (Maipo) Release: 7.9 Codename: Maipo
Hello! I thought it might be best to open a new ticket since I'm not sure this is related to the same issue as reformatting fastqs as before: (https://github.com/raw-lab/MetaCerberus/issues/5 ).
Something seems to be going awry in the hmmer and parsing steps when I run fastqs through metacerberus.
I thought it was my reads at first, but I switched to trying to use the example reads (Test_R1.fastq/Test_R2.fastq files) and am still having issues. Currently, it seems like metacerberus runs to completion but skips step 08 (hmmer search)? As a result, the final folder step_10-visualizeData/ is empty except for the presence of a combined/ folder, which has empty data .html and .tsv files.
There does not seem to be any error output for the test samples-- it just prints out the steps with no details and moves on.
I am not sure if it is the same issue or something to do with my reads specifically, but I have a similar problem (I think) when I run metacerberus with my own reads. The majority of my samples get dropped at steps either 08 or 09.
For example, if my input is paired reads for Sample1, Sample2, 3, 4, 5, 6, 7, 8, 9 and 10, and I run it on super mode (prodigal and fragscan), trimming will occur for all sets succesfully. But then at around step08 (hmmer) about half of the FragScan versions of samples are not included in output folders, and then more are dropped in step09 (parsing). By step-10-- the output will include only one or two outputs, ex: Fragscan_Sample11 and Prodigal_Sample3. These will both be included included in the combined output folder. But all other samples are missing
For the runs with multiple samples and drop issues, output has some detail:
I've tried inputting files both directly in the command and as an exterior config .yaml file. Any ideas?
Happy to send you some of my reads/output if you think that would help, just let me know!