Closed brunocontrerasmoreira closed 7 months ago
Good morning,
I am a member of the supercomputing center where Bruno is running this program. I add one more question to this thread:
is there a way to have these temporary files generated on the local scratch of the nodes of a supercomputer?
Regards, David
I found out that meryl was being invoked with max 21G of RAM despite me allowing more RAM in gridOptions:
/path/to/canu-2.2/build/bin/meryl k=22 threads=8 memory=21 \
count \
segment=$jobid/96 ../../Apin.seqStore \
output ./Apin.$jobid.meryl.WORKING \
If I edit this script and increase to memory=128G then the number of temp files of every array job was <100
The gridOptions is just passed through, it shouldn't be used to request resources as canu does that automatically on a per-job basis. See https://canu.readthedocs.io/en/latest/parameter-reference.html# for more details. You can specify meryl memory and threads instead which would update the above script and would also request the approximate memory from the grid. You can also limit concurrent jobs by modifying grid array parameters (gridEngineArrayOption="-a ARRAY_JOBS%4" on slurm would limit canu to at most 4 concurrent jobs).
As for local disk, there is an option for staging https://canu.readthedocs.io/en/latest/parameter-reference.html#file-staging but it isn't used for this step as it's usually not an I/O issue compared to later steps. If it's running out of space already here, it's likely you'll need significantly more disk space. A human genome w/HiFi at 40x requires about 200gb to compute, given your genome is much larger and likely more repetitive, I'd count on at least 2 tb of space being available to run.
Hi @skoren , we managed to get the meryl-count job done by increasing disk quota and giving it more RAM. The resulting folder 0-mercounts/
takes 3.4T of disk, can this help estimate how much disk space we need for the remaining jobs?
Thanks
Hi, I am testing canu in a slurm Linux cluster for the first time with 2.5TB HiFi compressed reads. This is the bash script I submitted with sbatch:
The stderr of this job contains:
However, the meryl-count jobs fail; here's the last line of
meryl-count.7051213_65.out
:When I checked the folder where this job was running I see a large number of files:
How can I change the slurm settings to:
reduce the number of temp files created by meryl-count
reduce the number of running meryl-count jobs at a time
Thanks for your help