nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Segmentation fault #67

Closed itslittman closed 11 months ago

itslittman commented 1 year ago

I ran the RNA SUP model on an experiment, and after 531,000 reads processed, it said “zsh:segmentation fault” and ended. I compiled dorado from source a few days ago, so that was the binary I was using (not the pre-built one).

Could there have been a problem with compilation (maybe relating to one of the many warnings I saw as it compiled)?

itslittman commented 1 year ago

It appears it is a certain subset of the data causing this error (and also the associated metal command buffer ltsm/softmax failures that preceded it)

iiSeymour commented 1 year ago

Are you able to share the subset @itslittman?

itslittman commented 1 year ago

@iiSeymour I think it's probably above the upload limit, but it was definitely weird. If I ran a single pod5 file as input, it would eventually crash. If I kept them as separate pod5 files but ran them all together, it seemed faster, but for that sample would also crash in the same general area (~540,000 reads in). So, I sectioned the pod5 files into 4 subfolders and ran them individually, with the intention of concatenating the resulting outputs. The third subsection crashed about 40,000 reads in (corresponding to a similar location as before), but when I sectioned off these ~60 pod5 files into further subsets, they all ran flawlessly. As an aside, basecalling on subsets of pod5 files separately like this was considerably faster in general - I got to almost 3x10^5 samples/s after adjusting batch size.