jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
346 stars 81 forks source link

Abundant phylum not being classified #823

Closed Manso002 closed 1 month ago

Manso002 commented 3 months ago

Hello all. I have run a squeeze meta project in sequential mode (my workstation can't run other modes with my samples) and the results show that in every single one of my 12 samples the top 2-3 most abundant taxa is represented as 'Unclassified bacteria'. I tried running the samples with several mapping and assembly software but i keep getting the same results. I was wondering if you knew a way i can assign taxonomy to it or if my samples are contaminated. Can't think of a solution. Thanks in advance.

jtamames commented 3 months ago

Hello Are you having the same results using sqm_reads or sqm_longreads? These are not relying in the assembly, just to be sure the problem is not there. I also recommend you: locate some of these unclassified ORFs. Check the Diamond results for these (04 file in intermediate directory). Then run a blastp search of the same ORF in NCBI, and compare results. And let me know :)

Best, J

fpusan commented 3 months ago

Also is this by any chance a metatranscriptome?

Manso002 commented 3 months ago

Thanks for the fast response! I didn't try sqm_longreads. I will try both approaches u suggest me and let you know.

@fpusan I actually have metagenome samples

Manso002 commented 3 months ago

Hello again. I am trying to run sqm_longreads and i get the error:

Error running /home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/bin/diamond blastx -q /home/meridian/ANE1/ANETO_1/ANETO_1_R1.fastq -p 12 -d /home/meridian/miniconda3/envs/SqueezeMeta2/db/nr.dmnd -f tab -F 15 --quiet --range-culling -b 15.4 -e 0.001 --id 30 --top 10 -o /home/meridian/ANE1/ANETO_1/ANETO_1_R1.fastq.gz.nr.blastx -f 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen at /home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/utils/sqm_longreads.pl line 933.

So i tried sqm_reads to see if i get the same error with blasx and that happens too. I can't figure out how to fix it. Also, I am not sure how to search for unclassified ORFS in Diamond results. The files being that huge makes it difficult to find unclassified orfs. I was wondering if you could help me.

Thanks in advance

jtamames commented 3 months ago

Hello What happens if you do /home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/bin/diamond If that works, try giving the full command and check the possible error message:

/home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/bin/diamond blastx -q /home/meridian/ANE1/ANETO_1/ANETO_1_R1.fastq -p 12 -d /home/meridian/miniconda3/envs/SqueezeMeta2/db/nr.dmnd -f tab -F 15 --quiet --range-culling -b 15.4 -e 0.001 --id 30 --top 10 -o /home/meridian/ANE1/ANETO_1/ANETO_1_R1.fastq.gz.nr.blastx -f 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen

For selecting the ORFs, just take a look at the 06.wranks file in the results directory. Locate some unclassified ORFs, and then get the aa sequence from the 03.faa file also in the results directory Best,

Manso002 commented 3 months ago

I get this error when running the command you suggested:

Error: Syntax: diamond COMMAND [OPTIONS]. To print help message: diamond help.

When calling for diamond, i get an error as if it is not installed. This is weird since diamond runs smoothly when running squeezemeta.pl.

jtamames commented 2 months ago

But /home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/bin/diamond returns anything?

Manso002 commented 2 months ago

Hi. It returns the error I commented you: Error: Syntax: diamond COMMAND [OPTIONS]. To print help message: diamond help.

Test_install.pl tells me everything is ok

jtamames commented 2 months ago

it does look ok, otherwise the error message would be diamond: not found, or similar. And then what is the output of /home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/bin/diamond blastx?

Manso002 commented 2 months ago

I get this: Error: Missing parameter: database file (--db/-d).

Thanks for the patience Javier, much appreciated.

jtamames commented 2 months ago

Then it is working properly. And now, what if you do /home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/bin/diamond blastx -q /home/meridian/ANE1/ANETO_1/ANETO_1_R1.fastq -p 12 -d /home/meridian/miniconda3/envs/SqueezeMeta2/db/nr.dmnd -f tab -F 15 --quiet --range-culling -b 15.4 -e 0.001 --id 30 --top 10 -o /home/meridian/ANE1/ANETO_1/ANETO_1_R1.fastq.gz.nr.blastx?

Manso002 commented 2 months ago

I get this: No such file or directory Error: Error calling stat on file /home/meridian/ANE1/ANETO_1/ANETO_1_R1.fastq

jtamames commented 2 months ago

Ok... could you check your samples file, to see if the fastq files you specify there are placed in the proper locations?

Manso002 commented 2 months ago

Files are correctly located, yes. Could the error be related to a memory issue? I am running some other resource-consumming processes and came to my mind that could be the problem.

jtamames commented 2 months ago

No, the error is that diamond is not finding the file /home/meridian/ANE1/ANETO_1/ANETO_1_R1.fastq. Could it be that it is gunzipped and therefore the name does not match?

Manso002 commented 2 months ago

fastq files are gunzipped, you are right. On the other hand, there are no fastq files of any kind in /home/meridian/ANE1/ANETO_1/.

jtamames commented 2 months ago

I am asking for that path because it is the one taht apparently you provided when running the script. Please check everything carefully (files, filenames, directories, etc) and try again.

Manso002 commented 2 months ago

I will try again double checking everything and let you know. Once again, thank you Javier.

Manso002 commented 2 months ago

Hello all. Sorry to bother you again. I have been trying to run sqm_reads and sqm_long reads following your recommendations @jtamames and it is not working. The code i used is this:

/home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/utils/sqm_reads.pl -p ANE1 -s /home/meridian/ANETO/sqm_reads_aneto1.txt -f /home/meridian/ANETO/ANETO_reads

But i again get the following error:

[59 seconds]: Running Diamond (Buchfink et al 2015, Nat Methods 12, 59-60) for taxa (GenBank nr, Clark et al 2016, Nucleic Acids Res 44, D67-D72) Error running command /home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/bin/diamond blastx -q /home/meridian/ANETO/ANETO_reads/ANETO_1_R1.fastq.gz -p 12 -d /home/meridian/miniconda3/envs/SqueezeMeta2/db/nr.dmnd -e 0.001 --quiet -f tab -b 15.2 -o ANE1/ANETO_1/ANETO_1_R1.fastq.gz.tax.m8 at /home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/utils/sqm_reads.pl line 245.

Hope you can help me. Thanks and have a nice day!

jtamames commented 2 months ago

Have you checked if your samples are in /home/meridian/ANETO/ANETO_reads?

Manso002 commented 2 months ago

Yes, samples are in that path.

jtamames commented 2 months ago

Ok, then what is the result of running: /home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/bin/diamond blastx -q /home/meridian/ANETO/ANETO_reads/ANETO_1_R1.fastq.gz -p 12 -d /home/meridian/miniconda3/envs/SqueezeMeta2/db/nr.dmnd o ANE1/ANETO_1/ANETO_1_R1.fastq.gz.tax.m8

Manso002 commented 2 months ago

Error: Invalid parameter count for option '--db'

jtamames commented 2 months ago

Sorry, mistake. Run it again: /home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/bin/diamond blastx -q /home/meridian/ANETO/ANETO_reads/ANETO_1_R1.fastq.gz -p 12 -d /home/meridian/miniconda3/envs/SqueezeMeta2/db/nr.dmnd -o ANE1/ANETO_1/ANETO_1_R1.fastq.gz.tax.m8

Manso002 commented 2 months ago

It is running. Although ANETO_1_R1.fastq.gz.tax.m8 has no info in it (0 bytes). I assume that should not happen right?

diamond v2.0.15.153 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

CPU threads: 12

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: ANE1/ANETO_1

Target sequences to report alignments for: 25

Opening the database... [0.08s] Database: /home/meridian/miniconda3/envs/SqueezeMeta2/db/nr.dmnd (type: Diamond database, sequences: 595907626, letters: 234169316349) Block size = 2000000000

jtamames commented 2 months ago

Okl. Please stop it and run the full command: /home/meridian/miniconda3/envs/SqueezeMeta2/SqueezeMeta/bin/diamond blastx -q /home/meridian/ANETO/ANETO_reads/ANETO_1_R1.fastq.gz -p 12 -d /home/meridian/miniconda3/envs/SqueezeMeta2/db/nr.dmnd -e 0.001 --quiet -f tab -b 15.2 -o ANE1/ANETO_1/ANETO_1_R1.fastq.gz.tax.m8

Manso002 commented 2 months ago

It seems like its running, although no output messages are shown like before. I will let you know if anything goes wrong.

Thanks a lot Javier

fpusan commented 1 month ago

Closing due to lack of activty, feel free to reopen