jts / nanopolish

Signal-level algorithms for MinION data
MIT License
568 stars 159 forks source link

Nanopolish call-methylation Processing Speed #1113

Open ghost opened 1 year ago

ghost commented 1 year ago

Hello, I'm using nanopolish call-methylation on an Ontario long read data with an average read depth of 35. I'm using 40 processors with 4 GB of memory per processor. I'm using the standard parameters (-t, -r, -b, -g). The job has been running for 12 days, and it has only completed processing chromosomes 1-4 and 10-22 so far. I'm wondering if there are ways to speed up the process, or if this long processing time is expected for this tool?

hasindu2008 commented 1 year ago

Likely to be the fast5 IO bottleneck You may try https://github.com/hasindu2008/f5c/ with the --iop option to spawn parallel processes for IO. F5c should give same output as nanopolish.

To go even more faster, the best solution is to convert your fast5 to blow5 using slow5tools and then run nanopolish or f5c on it. Instructions are at https://hasindu2008.github.io/slow5tools/workflows.html

.

ghost commented 1 year ago

Thank you for the quick and helpful response. I will try your suggestions. I am wondering if it is possible to use intervals (one Chr per job) when running the Nanopolish call-methylation and the calculate_methylation_frequency.py, and at the end concatenate the methylation_frequency.tsv files from each chromosome into a single file?