jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
369 stars 79 forks source link

Bug when using seqmerge mode #376

Closed alper1976 closed 2 years ago

alper1976 commented 2 years ago

Attached find some relevant log files and the run script

mergelog.txt syslog.txt slurm-4025637.out.txt mergedassemblies.seqmerge.runAmos.log.txt

fpusan commented 2 years ago

The AMOS log was a nice inclusion! Can you try running LD_LIBRARY_PATH=/cluster/projects/nn9745k/scripts/conda_envs/squeezemeta/SqueezeMeta/bin/AMOS/../../lib/mummer /cluster/projects/nn9745k/scripts/conda_envs/squeezemeta/SqueezeMeta/bin/AMOS/../mummer/nucmer --maxmatch --threads 12 -c 100 /cluster/work/users/alexaei/02_results/13_svalbard_metaGs/seqmerge/temp/mergedassemblies.seqmerge.ref.seq /cluster/work/users/alexaei/02_results/13_svalbard_metaGs/seqmerge/temp/mergedassemblies.seqmerge.qry.seq -p /cluster/work/users/alexaei/02_results/13_svalbard_metaGs/seqmerge/temp/mergedassemblies.seqmerge Let's see if we get a more detailed error message there

jtamames commented 2 years ago

Hello I think that this issue is related with memory usage by minimus2, like that in #109. Indeed the syslog file contains many instances of that very same error:

k-mer db: Building database
sh: line 1: 71296 Killed                  LD_LIBRARY_PATH=/cluster/projects/nn9745k/scripts/conda_envs/squeezemeta/SqueezeMeta/lib /cluster/projects/nn9745k/scripts/conda_envs/squeezemeta/SqueezeMeta/bin/kmer-db build -t 12 /cluster/work/users/alexaei/02_results/13_svalbard_metaGs/seqmerge/temp/samples.seqmerge.txt /cluster/work/users/alexaei/02_results/13_svalbard_metaGs/seqmerge/temp/kmerdb.seqmerge.txt -k 12 > /dev/null 2>&1
Error running command:    LD_LIBRARY_PATH=/cluster/projects/nn9745k/scripts/conda_envs/squeezemeta/SqueezeMeta/lib /cluster/projects/nn9745k/scripts/conda_envs/squeezemeta/SqueezeMeta/bin/kmer-db build -t 12 /cluster/work/users/alexaei/02_results/13_svalbard_metaGs/seqmerge/temp/samples.seqmerge.txt /cluster/work/users/alexaei/02_results/13_svalbard_metaGs/seqmerge/temp/kmerdb.seqmerge.txt -k 12 > /dev/null 2>&1 at /cluster/projects/nn9745k/scripts/conda_envs/squeezemeta/SqueezeMeta/lib/SqueezeMeta/kmerdist.pl line 65.

Could you try the same solution given in that issue?

Hello I have redone 01.merge_sequential.pl but it is still crashing with big datasets. But, we are trying decreasing the kmer size in kmer-db, and so far it is working with just 16 Gb RAM. Decreasing k-mer size implies less k-mers, and therefore reduces memory usage. If you want to try, edit the script kmerdist.pl in the lib/SqueezeMeta directory. There, change line 61: $command="$kmerdb_soft build -t $numthreads $samples $kmerdb > /dev/null 2>&1"; by $command="$kmerdb_soft build -t $numthreads $samples $kmerdb -k 12 > /dev/null 2>&1"; That will make kmer-db to user kmer size 12, instead the original 18. Since we just want to calculate an approximate similarity measure between metagenomes, it will suffice for our purposes.

Best, J

alper1976 commented 2 years ago

Thanks. This is the output that I get. Not sure if this helps.

The following modules were not unloaded: (Use "module --force purge" to unload all):

1) StdEnv ERROR: failed to merge alignments at position 256 Please file a bug report

Task and CPU usage stats: JobID JobName AllocCPUS NTasks MinCPU MinCPUTask AveCPU Elapsed ExitCode


4074012 sqmeta 12 1-02:45:37 1:0 4074012.bat+ batch 12 1 13-08:26:+ 0 13-08:26:+ 1-02:45:37 1:0 4074012.ext+ extern 12 1 00:00:00 0 00:00:00 1-02:45:37 0:0

Memory usage stats: JobID MaxRSS MaxRSSTask AveRSS MaxPages MaxPagesTask AvePages


4074012 4074012.bat+ 19248480K 0 19248480K 0 0 0 4074012.ext+ 0 0 0 0 0 0

Disk usage stats: JobID MaxDiskRead MaxDiskReadTask AveDiskRead MaxDiskWrite MaxDiskWriteTask AveDiskWrite


4074012 4074012.bat+ 33.41M 0 33.41M 0.06M 0 0.06M 4074012.ext+ 0.00M 0 0.00M 0 0 0

Job 4074012 completed at Mon Oct 25 13:55:01 CEST 2021.

I also attached my kmerdist.pl.

Best, Ale kmerdist.pl.txt x

fpusan commented 2 years ago

What is the syslog for this last run? Best, Fernando

alper1976 commented 2 years ago

We the command that you provided it did not update the syslog.

fpusan commented 2 years ago

Ah, so that output was for the command I sent? Ok, so nucmer is failing for that merge. Maybe we could try updating it to the latest version and see if that somehow helps. Still, it might be better to first try the solution proposed by @jtamames, since it's true that kmer-db had been killed (possibly out of memory) before nucmer being run. How much memory are you requesting in your cluster? Best, Fernando

alper1976 commented 2 years ago

Attached the slurm script run_squezzemeta_seqmerge_rerun.slurm.txt

alper1976 commented 2 years ago

If you check my kmerdist.pl you should see that I adapted the suggestion by @jtamames

fpusan commented 2 years ago

Yes, I saw that. However the command I told you to run was just to check whether nucmer was failing (it is), so the modifications from @jtamames didn't come into play yet. So nucmer is failing for sure. By looking at the batch script I think you are requesting enough memory (unless your assemblies are really big). However, since kmer-db was also failing before nucmer (which I didn't realize by the time I wrote the first message) I would try to look into that first. I think there is a way to repeat the merging step (so that the changes to the script apply) without having to restart everything from scratch. @jtamames can you help with this?

jtamames commented 2 years ago

Yeah, please check #351 for restarting seqmerges

alper1976 commented 2 years ago

Checking the syslog. In the last runs the kmer-db and nucmer seemed to run fine. It crashes in the scaffolding step if I am not mistaken.

fpusan commented 2 years ago

By looking at the lines before the last crash, you might be right.

Total CPU time 160878.05
Transforming to afg format: /cluster/projects/nn9745k/scripts/conda_envs/squeezemeta/SqueezeMeta/bin/AMOS/toAmos -s /cluster/work/users/alexaei/02_results/13_svalbard_metaGs/seqmerge/temp/mergedassemblies.seqmerge.99.fasta -o /cluster/work/users/alexaei/02_results/13_svalbard_metaGs/seqmerge/temp/mergedassemblies.seqmerge.afg
Merging with minimus2: /cluster/projects/nn9745k/scripts/conda_envs/squeezemeta/SqueezeMeta/bin/AMOS/minimus2_mod /cluster/work/users/alexaei/02_results/13_svalbard_metaGs/seqmerge/temp/mergedassemblies.seqmerge -D OVERLAP=100 -D MINID=95 -D THREADS=12 > /dev/null 2>&1
Stopping in STEP1.5 -> 01.merge_sequential.pl. Program finished abnormally

Then I wonder why kmer-db and nucmer failed in previous runs and worked in that one...

Anyways, then maybe you can try running the last command that failed alone, removing the redirection to /dev/null so that it's hopefully verbose about what's happening.

/cluster/projects/nn9745k/scripts/conda_envs/squeezemeta/SqueezeMeta/bin/AMOS/minimus2_mod /cluster/work/users/alexaei/02_results/13_svalbard_metaGs/seqmerge/temp/mergedassemblies.seqmerge -D OVERLAP=100 -D MINID=95 -D THREADS=12

alper1976 commented 2 years ago

I increased the memory

alper1976 commented 2 years ago

Here the output from the last run. Seems like nucmer has issues.

slurm-4274791.out.txt

fpusan commented 2 years ago

Hi! What's the memory capacity of your nodes? MaxRSSTask 19338352K may be due to a cap of 20 Gb for this task.

alper1976 commented 2 years ago

Hi. Thanks for your help this far. I fixed the memory issues but I am now running into time issues. Any suggestions to optimize the scaffolding. I already removed contigs shorter than 500 bp and run it for 14 days on 12 threads.

fpusan commented 2 years ago

Hi,

Sorry for the late response. Time issues might be difficult to solve. Minimus2 is a large blocker in this regard, as it works well for a moderate amount of samples but scales poorly when the input data becomes large. We have not been able to find a direct replacement to it, so at some point you will have to use either a coassembly (if you have enough RAM) or analyze each sample separately with the sequential mode.

An intermediate road would be to divide your samples in groups. For example you could use Mash to calculate pairwise distances between your metagenomes, cluster them based on similarity, and then analyze each cluster of samples independently.

blostein commented 2 years ago

Hi @fpusan , Thanks for a great tool and being so responsive. I've used SqueezeMeta for smaller batches of ~15 samples. Now I have a very large project (485 samples, although this includes positive/negative controls & some low-read samples I could remove, >800GB, paired reads, sample files highly variable in read count but median 10 M reads) and I'm trying to strategize my approach. Initially I was planning on using seqmerge mode, but reading this issue I'm concerned merging time due to minimus2 will be a problem for me too, given the number of samples and the n-1 merges needed. I can remove some low-read count samples but while this would reduce the number of merges needed it would not reduce very much the size of the pooled samples.

My goal is to compare general taxonomic and functional attributes across conditions but I'm also interested in bins. The following resources are available to me on a HPC, max walltime of 14 days:

Since I don't think Megahit is MPI enabled, I'm stuck with these per node limits. So far reading over the different issues on this, I think there are maybe a few different strategies:

  1. Assemble all samples individually in sequential mode, merge taxonomic and functional profiles in R following advice given in 153. Maybe parallelize runs to increase speed. Accept lack of bins or use derep (or pyani) to try and dereplicate bins.
  2. Use MASH to cluster samples by similarity (maybe into groups of 25 or 50 samples each?), co-assemble each cluster group and either analyze separately or merge SQM objects in R, again use derep/pyani to depreplicate bins.

Do you have any suggestions as to which would be better? or is there something else I should consider doing?

fpusan commented 2 years ago

Hi!

I'm a bit short of time but I'll try to give a quick answer. I would so for the first option you propose:

1) Run SqueezeMeta individually for each sample, only use metabat2 for binning 2) Combine the results as described in #153 3) Get all the bins together and merge them into ANI > 95% clusters. 4) You can use drep to derreplicate, but by doing so you will lose some info about the accessory genome. I have a new preprint on how to tackle this, maybe it will be of interest? (https://www.biorxiv.org/content/10.1101/2022.03.25.485477v1)

Of course the option with MASH is also cool and sound, and it will assemble more low abundance taxa. But the first one is easier and scales better. If you end up comparing both, I'd love to know the results!

Also I noticed that you posted something about not being able to install SqueezeMeta with mamba, but now can't find the issue. Is the problem still happening?

blostein commented 2 years ago

Thank you! I will do as you suggest at first. I am a little concerned because there are some low read samples that will probably not have very good assembly to contigs on their own, but these samples probably shouldn't be analyzed anyway. I'll read your preprint, thanks very much! Depending how it goes, I may be back with more questions :grimacing:

Yes, sorry, I deleted the mamba comment - in order to install using mamba, I had to create a new base environment instead of using the base conda environment provided on my HP. Then I didn't realize that in my new base environment there was no mkl. Installing numpy to the new base environment got me mkl, so it was just me being dense.

fpusan commented 2 years ago

Ah good to know! Next time maybe close the issue rather than deleting it, your experience may help others, and also I will be happier knowing that for once something was not my fault!