nanoporetech / megalodon

Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transriptome.
Other
197 stars 30 forks source link

Megalodon hangs when running on multi-fast5 files. #228

Closed Subhanjanb closed 2 years ago

Subhanjanb commented 2 years ago

Hi,

I have been trying to get megalodon on the cloud (AWS), on a sample that has 150GB of multi-fast5 files. Now, when I start megalodon on 500+ multi-fast5 file sample, it somehow hangs and none of it is reflected on the log file except that fact that nothing gets logged after a while. However, when I used only a few files, say around 10-15, it is running without any sudden halts.

I have not been able to figure out where the issue is; but I am guessing it could be I/O related bottleneck.

I have attached the log file for reference. aws.megalodon.logfile

Any input on this would be much appreciated. A secondary question What if I create N batches of 10-15 fast5 files to cover 500+ files from my sample, and later append them together?

Will this approach work?

marcus1487 commented 2 years ago

This log file appears to show a run that progressed successfully (and only has 10 files). Is this the result of the run that hangs? If not, can you post this log file? It may just be a reporting to stderr that is the issue here.

For the second question, yes you can absolutely run megalodon on subsets of files and merge these later. How the output files are merged depends on which file you aim to use in the end.

marcus1487 commented 2 years ago

Closing due to inactivity. If this issue persists please re-open this issue.