Closed TMAdams closed 10 months ago
You've got really high coverage, over 2000x which isn't ideal for assembly. I suspect this step is running out of memory to load that many overlaps. You can check the unitigging/4-unitigger/unitigger.000001.out
to confirm that, post the contents of that file here. Is there a particular reason you're setting the max coverage to 2000x? We recommend 50x for HiFi data typically.
Hi Sergey,
We're using enriched data which has gone through a PCR step, so we set it high to avoid the random downsampling step, otherwise we find we lose information for the final assembly.
Looking at that log file, it doesn't look to be reporting any errors, full contents:
Found perl:
/mnt/shared/scratch/tadams/smrtrenseq_assembly/.snakemake/conda/d198b113b159549338ae9bea44596cd3_/bin/perl
This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-thread-multi
Found java:
/mnt/shared/scratch/tadams/smrtrenseq_assembly/.snakemake/conda/d198b113b159549338ae9bea44596cd3_/bin/java
openjdk version "11.0.9.1-internal" 2020-11-04
Found canu:
/mnt/shared/scratch/tadams/smrtrenseq_assembly/.snakemake/conda/d198b113b159549338ae9bea44596cd3_/bin/canu
canu 2.2
Running job 1 based on command line options.
The job itself is running within slurm, which reports peak RSS of 20.9G and I provide it with 32G, so plenty of headroom. Slurm also isn't reporting it as an out of memory kill, but can't rule that out as I've seen that happen before. I'll try upping the memory and see if that helps.
Ah, I see, that's why the histogram looks off.
There should be more in the log than what you're seeing. Is there any unitigger.err files in the 4-unitigger folder? Can you post that if it exists?
I actually just cleared out the last run, I've got a re-run on now, but checking runs that worked the .out log looked the same. Once this re-run with higher memory finishes I'll check for those logs and come back here.
Thanks for getting back so quick!
Hi Sergey,
Just to update you, there was indeed an err log in the unitigging folder, this seems to suggest that the step is running out of memory, rather than slurm killing the job due to a lack of memory. I'll try upping the maxMemory option when running the job, hadn't spotted this was an option!
For reference, full error log from unitigging here:
==> PARAMETERS.
Resources:
Memory 16 GB
Compute Threads 4
Lengths:
Minimum read 0 bases
Maximum read 4294967295 bases
Minimum overlap 500 bases
Overlap Error Rates:
Graph 0.000 (0.030%)
Max 0.000 (0.030%)
Forced -.--- (-.---%) (not used)
Deviations:
Graph 12.000
Bubble 1.000
Repeat 1.000
Similarity Thresholds:
Graph 0.000
Bubble 0.010
Repeat 0.010
Edge Confusion:
Absolute 2500
Percent 15.0000
Unitig Construction:
Minimum intersection 500 bases
Maxiumum placements 2 positions
Debugging Enabled:
(none)
==> LOADING AND FILTERING OVERLAPS.
ReadInfo()-- Found 2333307 reads.
OverlapCache()-- limited to 16384MB memory (user supplied).
OverlapCache()-- 17MB for read data.
OverlapCache()-- 71MB for best edges.
OverlapCache()-- 231MB for tigs.
OverlapCache()-- 62MB for tigs - read layouts.
OverlapCache()-- 89MB for tigs - error profiles.
OverlapCache()-- 4096MB for tigs - error profile overlaps.
OverlapCache()-- 0MB for other processes.
OverlapCache()-- ---------
OverlapCache()-- 4612MB for data structures (sum of above).
OverlapCache()-- ---------
OverlapCache()-- 44MB for overlap store structure.
OverlapCache()-- 11727MB for overlap data.
OverlapCache()-- ---------
OverlapCache()-- 16384MB allowed.
OverlapCache()--
OverlapCache()-- Retain at least 3700 overlaps/read, based on 1850.00x coverage.
OverlapCache()-- Initial guess at 329 overlaps/read.
OverlapCache()--
OverlapCache()-- Adjusting for sparse overlaps.
OverlapCache()--
OverlapCache()-- reads loading olaps olaps memory
OverlapCache()-- olaps/read all some loaded free
OverlapCache()-- ---------- ------- ------- ----------- ------- --------
OverlapCache()-- 329 1304496 1028811 499073246 55.35% 4111 MB
OverlapCache()-- 590 1865632 467675 681437042 75.57% 1329 MB
OverlapCache()-- 776 2042294 291013 750417834 83.22% 276 MB
OverlapCache()-- 838 2086895 246412 767074049 85.07% 22 MB
OverlapCache()-- 844 2090452 242855 768543602 85.23% 0 MB
OverlapCache()-- Not enough memory to load the minimum number of overlaps; increase -M.
Update batMemory to 32 or 64 (it's currently using 16).
Will do, would updating the maxMemory flag have the same effect in case this errors further down the pipeline?
Just submitted an updated run, will update you when it finishes.
Thanks again for the help!
I don't think maxMemory would be enough as that just sets the allowed maximum but it won't force this step to use more memory like batMemory will.
Hi, just to follow up on this, increasing batMemory has meant the assemblies finish correctly.
Thanks for the help!
canu command run:
output of canu -version:
System: Cluster running Rocky Linux 8.9
Issue detail: Dear Canu devlopers,
I've been assembling a collection of HiFi reads for 130 samples. What I'm finding is an unexpected failure in three of my samples. These are from a latest batch of sequencing that produced larger input files than previous runs (~3TB when gzipped). These all seem to be failing at the same point, specifically it looks like the ctgStore isn't created, though I'm not clear why. Interestingly, other samples from this run that are 2.6 and 2.7 GB when gzipped.
Would greatly appreciate any help, I've tried varying the maxinputcoverage and genomesize parameters, but these haven't resolved the issue. Any assistance would be greatly appreciated! Full log file pasted below.