jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
346 stars 81 forks source link

error for assembly #804

Closed laibinhuang closed 4 months ago

laibinhuang commented 4 months ago

queezeMeta.pl -m coassembly -p /home/lbhuang/Moore/results_2021/CA1B -s /home/lbhuang/Moore/results_2021/SqMeta1B.txt -f "/home/lbhuang/Moore/results_2021/trim" --norename -binners maxbin, metabat, concoct -c 1000

I got an error for assembly: please help--thank you!

Running assembly with megahit: perl /home/lbhuang/mambaforge/envs/SqueezeMeta/SqueezeMeta/lib/SqueezeMeta/assembly_megahit.pl /home/lbhuang/Moore/results_2021/CA1B CA1B /home/lbhuang/Moore/results_2021/CA1B/data/raw_fastq/par1.fastq.gz /home/lbhuang/Moore/results_2021/CA1B/data/raw_fastq/par2.fastq.gz Error running command: /home/lbhuang/mambaforge/envs/SqueezeMeta/SqueezeMeta/bin/megahit/megahit -1 /home/lbhuang/Moore/results_2021/CA1B/data/raw_fastq/par1.fastq.gz -2 /home/lbhuang/Moore/results_2021/CA1B/data/raw_fastq/par2.fastq.gz -t 12 -o /home/lbhuang/Moore/results_2021/CA1B/data/megahit >> /home/lbhuang/Moore/results_2021/CA1B/syslog 2>&1 at /home/lbhuang/mambaforge/envs/SqueezeMeta/SqueezeMeta/lib/SqueezeMeta/assembly_megahit.pl line 36. Assembly not present in /home/lbhuang/Moore/results_2021/CA1B/results/01.CA1B.fasta, exiting Stopping in STEP1 -> 01.run_all_assemblies.pl. Program finished abnormally

jtamames commented 4 months ago

syslog file please

laibinhuang commented 4 months ago

It stops at K=21

Run started Mon Mar 4 08:14:19 2024 in coassembly mode

SqueezeMeta v1.6.3, September 2023 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN

Please cite: Tamames & Puente-Sanchez, Frontiers in Microbiology 10.3389 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349

Run started for CA1A, Mon Mar 4 08:14:19 2024 Project: CA1A Map file: /home/lbhuang/Moore/results_2021/SqMeta1A.txt Fastq directory: /home/lbhuang/Moore/results_2021/trim Command: /home/lbhuang/mambaforge/envs/SqueezeMeta/bin/SqueezeMeta.pl -m coassembly -p /home/lbhuang/Moore/results_2021/CA1A -s /home/lbhuang/Moore/results_2021/SqMeta1A.txt -f /home/lbhuang/Moore/results_2021/trim --norename -binners maxbin, metabat, concoct -c 1000 -t 32 [0 seconds]: STEP0 -> SqueezeMeta.pl COGS; KEGG; PFAM;

[0 seconds]: STEP1 -> 01.run_all_assemblies.pl (megahit) Preparing files for pair1: cat /home/lbhuang/Moore/results_2021/trim/2P11_1.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P11A1_1.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P11B1_1.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P21_1.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P31_1.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P41_1.fastq.gz > /home/lbhuang/Moore/results_2021/CA1A/data/raw_fastq/par1.fastq.gz Preparing files for pair2: cat /home/lbhuang/Moore/results_2021/trim/2P11_2.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P11A1_2.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P11B1_2.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P21_2.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P31_2.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P41_2.fastq.gz > /home/lbhuang/Moore/results_2021/CA1A/data/raw_fastq/par2.fastq.gz Running assembly with megahit: perl /home/lbhuang/mambaforge/envs/SqueezeMeta/SqueezeMeta/lib/SqueezeMeta/assembly_megahit.pl /home/lbhuang/Moore/results_2021/CA1A CA1A /home/lbhuang/Moore/results_2021/CA1A/data/raw_fastq/par1.fastq.gz /home/lbhuang/Moore/results_2021/CA1A/data/raw_fastq/par2.fastq.gz 2024-03-04 08:21:14 - MEGAHIT v1.2.9 2024-03-04 08:21:14 - Using megahit_core with POPCNT and BMI2 support 2024-03-04 08:21:14 - Convert reads to binary library 2024-03-04 08:31:22 - b'INFO sequence/io/sequence_lib.cpp : 77 - Lib 0 (/home/lbhuang/Moore/results_2021/CA1A/data/raw_fastq/par1.fastq.gz,/home/lbhuang/Moore/results_2021/CA1A/data/raw_fastq/par2.fastq.gz): pe, 634005290 reads, 151 max length' 2024-03-04 08:31:22 - b'INFO utils/utils.h : 152 - Real: 608.8190\tuser: 387.8713\tsys: 202.4206\tmaxrss: 241088' 2024-03-04 08:31:22 - k-max reset to: 141 2024-03-04 08:31:22 - Start assembly. Number of CPU threads 32 2024-03-04 08:31:22 - k list: 21,29,39,59,79,99,119,141 2024-03-04 08:31:22 - Memory used: 486732036096 2024-03-04 08:31:22 - Extract solid (k+1)-mers for k = 21

fpusan commented 4 months ago

You are most likely running out of memory, you have , 634 005 290 reads in total. You'll probably need a bigger server, or assembling each sample individually

laibinhuang commented 4 months ago

Hi Jtamames/Squeezemeta,

I also try using my contig for each sample: SqueezeMeta.pl -m sequential -s /home/lbhuang/Moore/results_2021/SqMeta.txt -f "/home/lbhuang/Moore/results_2021/trim" -extassembly "/home/lbhuang/Moore/results_2021/contig" -binners maxbin, metabat, concoct -c 1000 -t 32

It seems like this commend didn't use my contigs, and meanwhile it didn't allow me to create my own dir for the results.

Please help! Thank you very much

But I get errors like this: --- SAMPLE 2P11 --- Now creating directories Reading configuration from /home/lbhuang/2P11/SqueezeMeta_conf.pl Running trimmomatic (Bolger et al 2014, Bioinformatics 30(15):2114-20) for quality filtering Parameters: [34m[1 seconds]: STEP1 -> RUNNING ASSEMBLY: 01.run_all_assemblies.pl (megahit) [0m External assembly provided: /home/lbhuang/Moore/results_2021/contig. Overriding assembly cp: -r not specified; omitting directory '/home/lbhuang/Moore/results_2021/contig' Renaming contigs in /home/lbhuang/2P11/results/01.2P11.fasta Can't open /home/lbhuang/2P11/results/01.2P11.fasta [31mStopping in STEP1 -> 01.run_all_assemblies.pl. Program finished abnormally [0m [31m

On Mon, Mar 4, 2024 at 10:56 AM Fernando Puente-Sánchez < @.***> wrote:

You are most likely running out of memory, you have , 634 005 290 reads in total. You'll probably need a bigger server, or assembling each sample individually

— Reply to this email directly, view it on GitHub https://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-1977043396, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIQOU2NTVHDJ3KZ62TYA6DYWSRSNAVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGA2DGMZZGY . You are receiving this because you authored the thread.Message ID: @.***>

-- ----------------------------------------------------------- Laibin Huang, Ph.D.

Assistant Professor of Microbiology Saint Louis University, St. Louis, MO Department of Biology Macelwane Hall 301 3507 Laclede Ave. St. Louis, MO 63103


fpusan commented 4 months ago

If you have an external assembly you won't run into the memory problem you had before. You can use -m coassembly in that case (It won't actually run the assembly, but will map all your samples against your external assemblies and use that for binning)

laibinhuang commented 4 months ago

I mean can I use extassembly in sequential mode:

SqueezeMeta.pl -m sequential -s /home/lbhuang/Moore/results_2021/SqMeta.txt -f "/home/lbhuang/Moore/results_2021/trim" -extassembly "/home/lbhuang/Moore/results_2021/contig" -binners maxbin, metabat, concoct -c 1000 -t 32.

On Mon, Mar 4, 2024 at 12:27 PM Fernando Puente-Sánchez < @.***> wrote:

If you have an external assembly you won't run into the memory problem you had before. You can use -m coassembly in that case (It won't actually run the assembly, but will map all your samples against your external assemblies and use that for binning)

— Reply to this email directly, view it on GitHub https://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-1977204257, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIQOU762XXZTF6SZDCTOMTYWS4KXAVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGIYDIMRVG4 . You are receiving this because you authored the thread.Message ID: @.***>

-- ----------------------------------------------------------- Laibin Huang, Ph.D.

Assistant Professor of Microbiology Saint Louis University, St. Louis, MO Department of Biology Macelwane Hall 301 3507 Laclede Ave. St. Louis, MO 63103


fpusan commented 4 months ago

is /home/lbhuang/Moore/results_2021/contig a valid fasta file? Or a directory?

laibinhuang commented 4 months ago

Directory, sorry

Get Outlook for iOShttps://aka.ms/o0ukef


From: Laibin Huang @.> Sent: Monday, March 4, 2024 4:20:44 PM To: jtamames/SqueezeMeta @.>; jtamames/SqueezeMeta @.> Cc: Author @.> Subject: Re: [jtamames/SqueezeMeta] error for assembly (Issue #804)

A die contain contig for all samples

Get Outlook for iOShttps://aka.ms/o0ukef


From: Fernando Puente-Sánchez @.> Sent: Monday, March 4, 2024 3:10:05 PM To: jtamames/SqueezeMeta @.> Cc: laibinhuang @.>; Author @.> Subject: Re: [jtamames/SqueezeMeta] error for assembly (Issue #804)

is /home/lbhuang/Moore/results_2021/contig a valid fasta file? Or a directory?

— Reply to this email directly, view it on GitHubhttps://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-1977460244, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADIQOU6FFMJ6GDOOGLNDTMLYWTPK3AVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGQ3DAMRUGQ. You are receiving this because you authored the thread.Message ID: @.***>

laibinhuang commented 4 months ago

A die contain contig for all samples

Get Outlook for iOShttps://aka.ms/o0ukef


From: Fernando Puente-Sánchez @.> Sent: Monday, March 4, 2024 3:10:05 PM To: jtamames/SqueezeMeta @.> Cc: laibinhuang @.>; Author @.> Subject: Re: [jtamames/SqueezeMeta] error for assembly (Issue #804)

is /home/lbhuang/Moore/results_2021/contig a valid fasta file? Or a directory?

— Reply to this email directly, view it on GitHubhttps://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-1977460244, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADIQOU6FFMJ6GDOOGLNDTMLYWTPK3AVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGQ3DAMRUGQ. You are receiving this because you authored the thread.Message ID: @.***>

fpusan commented 4 months ago

Then please check the ReadMe. You need to provide a fasta file with the assembly

laibinhuang commented 4 months ago

Ok, I will do that. Thank you, Yes, the readme is confused me. [path] and then you said file. which means we cannot use the contigs in sequential model

On Mon, Mar 4, 2024 at 4:23 PM Fernando Puente-Sánchez < @.***> wrote:

Then please check the ReadMe. You need to provide a fasta file with the assembly

— Reply to this email directly, view it on GitHub https://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-1977571500, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIQOU2TT24JWYZCRYMR2MTYWTX5HAVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGU3TCNJQGA . You are receiving this because you authored the thread.Message ID: @.***>

-- ----------------------------------------------------------- Laibin Huang, Ph.D.

Assistant Professor of Microbiology Saint Louis University, St. Louis, MO Department of Biology Macelwane Hall 301 3507 Laclede Ave. St. Louis, MO 63103


fpusan commented 4 months ago

Yeah maybe it's not 100% clear. I have changed it to "Path to a file containing an external assembly..." which should be better

laibinhuang commented 3 months ago

Yes, it works for the binning process, but the SQM file (6G/ each sample) is too big to get into R to do the analysis.

Do you have any suggestions for this?

Thank you, Laibin

On Tue, Mar 5, 2024 at 6:49 AM Fernando Puente-Sánchez < @.***> wrote:

Yeah maybe it's not 100% clear. I have changed it to "Path to a file containing an external assembly..." which should be better

— Reply to this email directly, view it on GitHub https://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-1978707155, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIQOU746F3DHHQIKE5GSDTYWW5M3AVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZYG4YDOMJVGU . You are receiving this because you authored the thread.Message ID: @.***>

-- ----------------------------------------------------------- Laibin Huang, Ph.D.

Assistant Professor of Microbiology Saint Louis University, St. Louis, MO Department of Biology Macelwane Hall 301 3507 Laclede Ave. St. Louis, MO 63103


fpusan commented 3 months ago

Yes, it becomes a bit difficult if you need to work with lots of samples at the same time. There are ways work while avoiding a high memory usage, although they are convoluted.

If you don't need to do subsetting, you can just load the project/result/tables directory for each sample with loadSQMlite and then run combineSQMlite to merge all the samples into a single object.

If you need to do subsetting the process would be like follows

  1. Create an empty list reslist = list()
  2. For each sample 2.1 Load it with sa = loadSQM("/path/to/sample") 2.2 Perform the subsetting you need, for example sa.sub = subsetTax(sa, "phylum", "Pseudomonadota") 2.3 Transform the result to a SQMlite object (which has a minimal memory footprint but can not be subsetted further) sa.sub.sqmlite = SQMtools:::SQMtoSQMlite(sa.sub) 2.4 Store this in the list you created previously reslist = c(reslist, list(sa.sub.qmlite))
  3. Once you've done this for all samples, merge everything together all.sub.sqmlite = combineSQMlite(reslist)
  4. Explore or plot the results, e.g. plotTaxonomy(all.sub.sqmlite)
laibinhuang commented 3 months ago

Thank you very much; it won't load even for one sample in my case;

I think I may need to check only some functions using sqm_annot.pl http://sqm_annot.pl:

*my question will be how can I use *sqm_annot.pl http://sqm_annot.pl: to analyze only N and C cycling

sqm_annot.pl http://sqm_annot.pl -m coassembly -p /home/lbhuang/Moore/results_test/P21 -s /home/lbhuang/Moore/results_test/P21.txt -f "/home/lbhuang/Moore/results_test/trim" -extassembly "/home/lbhuang/Moore/results_test/contig/P21.fasta" --norename -binners "concoct,maxbin,metabat2" -c 1000 -t 32

On Thu, Mar 21, 2024 at 7:49 AM Fernando Puente-Sánchez < @.***> wrote:

Yes, it becomes a bit difficult if you need to work with lots of samples at the same time. There are ways work while avoiding a high memory usage, although they are convoluted.

If you don't need to do subsetting, you can just load the project/result/tables directory for each sample with loadSQMlite and then run combineSQMlite to merge all the samples into a single object.

If you need to do subsetting the process would be like follows

  1. Create an empty list reslist = list()
  2. For each sample 2.1 Load it with sa = loadSQM("/path/to/sample") 2.2 Perform the subsetting you need, for example sa.sub = subsetTax(sa, "phylum", "Pseudomonadota") 2.3 Transform the result to a SQMlite object (which has a minimal memory footprint but can not be subsetted further) sa.sub.sqmlite = SQMtools:::SQMtoSQMlite(sa.sub) 2.4 Store this in the list you created previously reslist = c(reslist, list(sa.sub.qmlite))
  3. Once you've done this for all samples, merge everything together all.sub.sqmlite = combineSQMlite(reslist)
  4. Explore or plot the results, e.g. plotTaxonomy(all.sub.sqmlite)

— Reply to this email directly, view it on GitHub https://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-2012203055, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIQOU6OVVMRPXAETJCBAITYZLJOFAVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGIYDGMBVGU . You are receiving this because you authored the thread.Message ID: @.***>

-- ----------------------------------------------------------- Laibin Huang, Ph.D.

Assistant Professor of Microbiology Saint Louis University, St. Louis, MO Department of Biology Macelwane Hall 301 3507 Laclede Ave. St. Louis, MO 63103