Open BlkPingu opened 1 year ago
Hi @BlkPingu ,
Have you tried on multiple input datasets, or only one?
I'll give it a go now with some input data I have locally.
Right @BlkPingu ,
It does all seem to work as expected on my end, I converted ~10GB of data through the script and it didnt crash.
Is it possible there is a specific part of the input datasets that is corrupted?
Hello George, that is wild. Did you run the script using a macOS device or something else? It could very well be that the data is corrupted, at least partially. Thanks for th suggestion.
Hi @BlkPingu ,
It was on an Apple M1 Max
with 32 GB of memory. I did notice the script using > 40GB of memory while running - which was quite exciting.
If you find it reproduces specifically with one file I could have a look at the file?
To assist reproducing bugs, please include the following:
Crash report:
Faulthandler:
Script:
Basically, when reading multiple fast5 files my script seqfaults. I have no idea why, but faulthandler points me in the direction of dataset.py. The crash report indicates the segfault occurred in the library's plugin for VBZ compression
libvbz_hdf_plugin_m1.dylib
.Scripts purpose is to combine multiple barcodes data, each with pass and fails, into one parquet file. For some reason it seqfaults after writing a few GB worth of data. Please don't roast me for the terrible code quality. It's just to process some data into a different random access format.
Any advice?