Open alexyfyf opened 1 year ago
Hi @alexyfyf,
Looks like your job was killed because of Out Of Memory (OOM) reading from slurmstepd: error: Detected 3526 oom_kill events in StepId=12206690.batch. Some of the step tasks have been OOM Killed.
So giving more memory may help. I am not the developer of this tool so I canot give detailed advice or insights on its implementation, but I developed isONclust which should give identical results to isONclust2. So I can help you with isONclust if you also decide to try that tool.
Note though that isONclust2 was developed to improve mainly speed over isONclust, as original isONclust is implemented in Python. How many reads do you have?
Hi Kristoffer, thank you for your suggestion, I'll try isONclust and see if it runs. The OOM issue looks weird, as these level 1 files are typically a few MB (definitely <100MB). Not sure why those specific ones failed.
Alex
Hi team,
I am using this as part of https://github.com/epi2me-labs/wf-transcriptomes/ I am able to run make batches, generating 0-48 batches, and the following clustering step failed. The error message is
slurmstepd: error: Detected 3526 oom_kill events in StepId=12206690.batch. Some of the step tasks have been OOM Killed.
But when I examine the log files, all job_level_0 output was generated, but most level_1 output not. I tried to run the failed script from level_1.sh
It showed segmentation fault (core dumped).
There are some were successfully run for level_1
I noticed the minimizes is 0 for the right cluster, but not sure if this is related. This error caused then all subsequent issues. The file sizes seem small, and I have requested 16GB per core in a slurm management system. I need some help to run this if you could kindly have a look at the issue.
Thanks a lot.