Closed laibinhuang closed 4 months ago
syslog file please
It stops at K=21
Run started Mon Mar 4 08:14:19 2024 in coassembly mode
SqueezeMeta v1.6.3, September 2023 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN
Please cite: Tamames & Puente-Sanchez, Frontiers in Microbiology 10.3389 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349
Run started for CA1A, Mon Mar 4 08:14:19 2024 Project: CA1A Map file: /home/lbhuang/Moore/results_2021/SqMeta1A.txt Fastq directory: /home/lbhuang/Moore/results_2021/trim Command: /home/lbhuang/mambaforge/envs/SqueezeMeta/bin/SqueezeMeta.pl -m coassembly -p /home/lbhuang/Moore/results_2021/CA1A -s /home/lbhuang/Moore/results_2021/SqMeta1A.txt -f /home/lbhuang/Moore/results_2021/trim --norename -binners maxbin, metabat, concoct -c 1000 -t 32 [0 seconds]: STEP0 -> SqueezeMeta.pl COGS; KEGG; PFAM;
[0 seconds]: STEP1 -> 01.run_all_assemblies.pl (megahit) Preparing files for pair1: cat /home/lbhuang/Moore/results_2021/trim/2P11_1.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P11A1_1.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P11B1_1.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P21_1.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P31_1.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P41_1.fastq.gz > /home/lbhuang/Moore/results_2021/CA1A/data/raw_fastq/par1.fastq.gz Preparing files for pair2: cat /home/lbhuang/Moore/results_2021/trim/2P11_2.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P11A1_2.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P11B1_2.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P21_2.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P31_2.fastq.gz /home/lbhuang/Moore/results_2021/trim/2P41_2.fastq.gz > /home/lbhuang/Moore/results_2021/CA1A/data/raw_fastq/par2.fastq.gz Running assembly with megahit: perl /home/lbhuang/mambaforge/envs/SqueezeMeta/SqueezeMeta/lib/SqueezeMeta/assembly_megahit.pl /home/lbhuang/Moore/results_2021/CA1A CA1A /home/lbhuang/Moore/results_2021/CA1A/data/raw_fastq/par1.fastq.gz /home/lbhuang/Moore/results_2021/CA1A/data/raw_fastq/par2.fastq.gz 2024-03-04 08:21:14 - MEGAHIT v1.2.9 2024-03-04 08:21:14 - Using megahit_core with POPCNT and BMI2 support 2024-03-04 08:21:14 - Convert reads to binary library 2024-03-04 08:31:22 - b'INFO sequence/io/sequence_lib.cpp : 77 - Lib 0 (/home/lbhuang/Moore/results_2021/CA1A/data/raw_fastq/par1.fastq.gz,/home/lbhuang/Moore/results_2021/CA1A/data/raw_fastq/par2.fastq.gz): pe, 634005290 reads, 151 max length' 2024-03-04 08:31:22 - b'INFO utils/utils.h : 152 - Real: 608.8190\tuser: 387.8713\tsys: 202.4206\tmaxrss: 241088' 2024-03-04 08:31:22 - k-max reset to: 141 2024-03-04 08:31:22 - Start assembly. Number of CPU threads 32 2024-03-04 08:31:22 - k list: 21,29,39,59,79,99,119,141 2024-03-04 08:31:22 - Memory used: 486732036096 2024-03-04 08:31:22 - Extract solid (k+1)-mers for k = 21
You are most likely running out of memory, you have , 634 005 290 reads in total. You'll probably need a bigger server, or assembling each sample individually
Hi Jtamames/Squeezemeta,
I also try using my contig for each sample: SqueezeMeta.pl -m sequential -s /home/lbhuang/Moore/results_2021/SqMeta.txt -f "/home/lbhuang/Moore/results_2021/trim" -extassembly "/home/lbhuang/Moore/results_2021/contig" -binners maxbin, metabat, concoct -c 1000 -t 32
It seems like this commend didn't use my contigs, and meanwhile it didn't allow me to create my own dir for the results.
Please help! Thank you very much
But I get errors like this: --- SAMPLE 2P11 --- Now creating directories Reading configuration from /home/lbhuang/2P11/SqueezeMeta_conf.pl Running trimmomatic (Bolger et al 2014, Bioinformatics 30(15):2114-20) for quality filtering Parameters: [34m[1 seconds]: STEP1 -> RUNNING ASSEMBLY: 01.run_all_assemblies.pl (megahit) [0m External assembly provided: /home/lbhuang/Moore/results_2021/contig. Overriding assembly cp: -r not specified; omitting directory '/home/lbhuang/Moore/results_2021/contig' Renaming contigs in /home/lbhuang/2P11/results/01.2P11.fasta Can't open /home/lbhuang/2P11/results/01.2P11.fasta [31mStopping in STEP1 -> 01.run_all_assemblies.pl. Program finished abnormally [0m [31m
On Mon, Mar 4, 2024 at 10:56 AM Fernando Puente-Sánchez < @.***> wrote:
You are most likely running out of memory, you have , 634 005 290 reads in total. You'll probably need a bigger server, or assembling each sample individually
— Reply to this email directly, view it on GitHub https://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-1977043396, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIQOU2NTVHDJ3KZ62TYA6DYWSRSNAVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGA2DGMZZGY . You are receiving this because you authored the thread.Message ID: @.***>
-- ----------------------------------------------------------- Laibin Huang, Ph.D.
Assistant Professor of Microbiology Saint Louis University, St. Louis, MO Department of Biology Macelwane Hall 301 3507 Laclede Ave. St. Louis, MO 63103
If you have an external assembly you won't run into the memory problem you had before. You can use -m coassembly
in that case (It won't actually run the assembly, but will map all your samples against your external assemblies and use that for binning)
I mean can I use extassembly in sequential mode:
SqueezeMeta.pl -m sequential -s /home/lbhuang/Moore/results_2021/SqMeta.txt -f "/home/lbhuang/Moore/results_2021/trim" -extassembly "/home/lbhuang/Moore/results_2021/contig" -binners maxbin, metabat, concoct -c 1000 -t 32.
On Mon, Mar 4, 2024 at 12:27 PM Fernando Puente-Sánchez < @.***> wrote:
If you have an external assembly you won't run into the memory problem you had before. You can use -m coassembly in that case (It won't actually run the assembly, but will map all your samples against your external assemblies and use that for binning)
— Reply to this email directly, view it on GitHub https://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-1977204257, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIQOU762XXZTF6SZDCTOMTYWS4KXAVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGIYDIMRVG4 . You are receiving this because you authored the thread.Message ID: @.***>
-- ----------------------------------------------------------- Laibin Huang, Ph.D.
Assistant Professor of Microbiology Saint Louis University, St. Louis, MO Department of Biology Macelwane Hall 301 3507 Laclede Ave. St. Louis, MO 63103
is /home/lbhuang/Moore/results_2021/contig
a valid fasta file? Or a directory?
Directory, sorry
Get Outlook for iOShttps://aka.ms/o0ukef
From: Laibin Huang @.> Sent: Monday, March 4, 2024 4:20:44 PM To: jtamames/SqueezeMeta @.>; jtamames/SqueezeMeta @.> Cc: Author @.> Subject: Re: [jtamames/SqueezeMeta] error for assembly (Issue #804)
A die contain contig for all samples
Get Outlook for iOShttps://aka.ms/o0ukef
From: Fernando Puente-Sánchez @.> Sent: Monday, March 4, 2024 3:10:05 PM To: jtamames/SqueezeMeta @.> Cc: laibinhuang @.>; Author @.> Subject: Re: [jtamames/SqueezeMeta] error for assembly (Issue #804)
is /home/lbhuang/Moore/results_2021/contig a valid fasta file? Or a directory?
— Reply to this email directly, view it on GitHubhttps://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-1977460244, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADIQOU6FFMJ6GDOOGLNDTMLYWTPK3AVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGQ3DAMRUGQ. You are receiving this because you authored the thread.Message ID: @.***>
A die contain contig for all samples
Get Outlook for iOShttps://aka.ms/o0ukef
From: Fernando Puente-Sánchez @.> Sent: Monday, March 4, 2024 3:10:05 PM To: jtamames/SqueezeMeta @.> Cc: laibinhuang @.>; Author @.> Subject: Re: [jtamames/SqueezeMeta] error for assembly (Issue #804)
is /home/lbhuang/Moore/results_2021/contig a valid fasta file? Or a directory?
— Reply to this email directly, view it on GitHubhttps://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-1977460244, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADIQOU6FFMJ6GDOOGLNDTMLYWTPK3AVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGQ3DAMRUGQ. You are receiving this because you authored the thread.Message ID: @.***>
Then please check the ReadMe. You need to provide a fasta file with the assembly
Ok, I will do that. Thank you, Yes, the readme is confused me. [path] and then you said file. which means we cannot use the contigs in sequential model
On Mon, Mar 4, 2024 at 4:23 PM Fernando Puente-Sánchez < @.***> wrote:
Then please check the ReadMe. You need to provide a fasta file with the assembly
— Reply to this email directly, view it on GitHub https://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-1977571500, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIQOU2TT24JWYZCRYMR2MTYWTX5HAVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGU3TCNJQGA . You are receiving this because you authored the thread.Message ID: @.***>
-- ----------------------------------------------------------- Laibin Huang, Ph.D.
Assistant Professor of Microbiology Saint Louis University, St. Louis, MO Department of Biology Macelwane Hall 301 3507 Laclede Ave. St. Louis, MO 63103
Yeah maybe it's not 100% clear. I have changed it to "Path to a file containing an external assembly..." which should be better
Yes, it works for the binning process, but the SQM file (6G/ each sample) is too big to get into R to do the analysis.
Do you have any suggestions for this?
Thank you, Laibin
On Tue, Mar 5, 2024 at 6:49 AM Fernando Puente-Sánchez < @.***> wrote:
Yeah maybe it's not 100% clear. I have changed it to "Path to a file containing an external assembly..." which should be better
— Reply to this email directly, view it on GitHub https://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-1978707155, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIQOU746F3DHHQIKE5GSDTYWW5M3AVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZYG4YDOMJVGU . You are receiving this because you authored the thread.Message ID: @.***>
-- ----------------------------------------------------------- Laibin Huang, Ph.D.
Assistant Professor of Microbiology Saint Louis University, St. Louis, MO Department of Biology Macelwane Hall 301 3507 Laclede Ave. St. Louis, MO 63103
Yes, it becomes a bit difficult if you need to work with lots of samples at the same time. There are ways work while avoiding a high memory usage, although they are convoluted.
If you don't need to do subsetting, you can just load the project/result/tables
directory for each sample with loadSQMlite
and then run combineSQMlite
to merge all the samples into a single object.
If you need to do subsetting the process would be like follows
reslist = list()
sa = loadSQM("/path/to/sample")
2.2 Perform the subsetting you need, for example sa.sub = subsetTax(sa, "phylum", "Pseudomonadota")
2.3 Transform the result to a SQMlite object (which has a minimal memory footprint but can not be subsetted further) sa.sub.sqmlite = SQMtools:::SQMtoSQMlite(sa.sub)
2.4 Store this in the list you created previously reslist = c(reslist, list(sa.sub.qmlite))
all.sub.sqmlite = combineSQMlite(reslist)
plotTaxonomy(all.sub.sqmlite)
Thank you very much; it won't load even for one sample in my case;
I think I may need to check only some functions using sqm_annot.pl http://sqm_annot.pl:
*my question will be how can I use *sqm_annot.pl http://sqm_annot.pl: to analyze only N and C cycling
sqm_annot.pl http://sqm_annot.pl -m coassembly -p /home/lbhuang/Moore/results_test/P21 -s /home/lbhuang/Moore/results_test/P21.txt -f "/home/lbhuang/Moore/results_test/trim" -extassembly "/home/lbhuang/Moore/results_test/contig/P21.fasta" --norename -binners "concoct,maxbin,metabat2" -c 1000 -t 32
On Thu, Mar 21, 2024 at 7:49 AM Fernando Puente-Sánchez < @.***> wrote:
Yes, it becomes a bit difficult if you need to work with lots of samples at the same time. There are ways work while avoiding a high memory usage, although they are convoluted.
If you don't need to do subsetting, you can just load the project/result/tables directory for each sample with loadSQMlite and then run combineSQMlite to merge all the samples into a single object.
If you need to do subsetting the process would be like follows
- Create an empty list reslist = list()
- For each sample 2.1 Load it with sa = loadSQM("/path/to/sample") 2.2 Perform the subsetting you need, for example sa.sub = subsetTax(sa, "phylum", "Pseudomonadota") 2.3 Transform the result to a SQMlite object (which has a minimal memory footprint but can not be subsetted further) sa.sub.sqmlite = SQMtools:::SQMtoSQMlite(sa.sub) 2.4 Store this in the list you created previously reslist = c(reslist, list(sa.sub.qmlite))
- Once you've done this for all samples, merge everything together all.sub.sqmlite = combineSQMlite(reslist)
- Explore or plot the results, e.g. plotTaxonomy(all.sub.sqmlite)
— Reply to this email directly, view it on GitHub https://github.com/jtamames/SqueezeMeta/issues/804#issuecomment-2012203055, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIQOU6OVVMRPXAETJCBAITYZLJOFAVCNFSM6AAAAABEESVCW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGIYDGMBVGU . You are receiving this because you authored the thread.Message ID: @.***>
-- ----------------------------------------------------------- Laibin Huang, Ph.D.
Assistant Professor of Microbiology Saint Louis University, St. Louis, MO Department of Biology Macelwane Hall 301 3507 Laclede Ave. St. Louis, MO 63103
queezeMeta.pl -m coassembly -p /home/lbhuang/Moore/results_2021/CA1B -s /home/lbhuang/Moore/results_2021/SqMeta1B.txt -f "/home/lbhuang/Moore/results_2021/trim" --norename -binners maxbin, metabat, concoct -c 1000
I got an error for assembly: please help--thank you!
Running assembly with megahit: perl /home/lbhuang/mambaforge/envs/SqueezeMeta/SqueezeMeta/lib/SqueezeMeta/assembly_megahit.pl /home/lbhuang/Moore/results_2021/CA1B CA1B /home/lbhuang/Moore/results_2021/CA1B/data/raw_fastq/par1.fastq.gz /home/lbhuang/Moore/results_2021/CA1B/data/raw_fastq/par2.fastq.gz Error running command: /home/lbhuang/mambaforge/envs/SqueezeMeta/SqueezeMeta/bin/megahit/megahit -1 /home/lbhuang/Moore/results_2021/CA1B/data/raw_fastq/par1.fastq.gz -2 /home/lbhuang/Moore/results_2021/CA1B/data/raw_fastq/par2.fastq.gz -t 12 -o /home/lbhuang/Moore/results_2021/CA1B/data/megahit >> /home/lbhuang/Moore/results_2021/CA1B/syslog 2>&1 at /home/lbhuang/mambaforge/envs/SqueezeMeta/SqueezeMeta/lib/SqueezeMeta/assembly_megahit.pl line 36. Assembly not present in /home/lbhuang/Moore/results_2021/CA1B/results/01.CA1B.fasta, exiting [31mStopping in STEP1 -> 01.run_all_assemblies.pl. Program finished abnormally