Open telegraphic opened 1 year ago
With a bit of hacking, I think you should be able to recover most of the data. First, I would just add print statements in bshuf_h5filter.c
to figure out which exactly what function is returning an error code and the value of that code (the core functions of bitshuffle some some specific error codes with meanings).
Thanks @kiyo-masui, I'll take a look following that strategy.
As it's an issue with decompression, looks like here is a good place to start: https://github.com/kiyo-masui/bitshuffle/blob/fdfcd404ac8dcb828857a90c559d36d8ac4c2968/src/bshuf_h5filter.c#L183
Which calls: https://github.com/kiyo-masui/bitshuffle/blob/ac791b73d164068661566bbe4335fc7158372c49/src/bitshuffle.c#L238
And then each block is done with: https://github.com/kiyo-masui/bitshuffle/blob/fdfcd404ac8dcb828857a90c559d36d8ac4c2968/src/bitshuffle.c#L78
Hi @kiyo-masui, we have some SETI data stored with bitshuffle compression, and a small number of files appear to have become corrupted. (Here is one, FYI: https://bldata.berkeley.edu/blpd30_datax2/blc03_guppi_59132_36704_HIP111595_0078.rawspec.0002.h5)
h5py
is happy to open the file, but barfs if you try and access the bitshuffled dataset:Do you think this file is recoverable (or partly recoverable)? Is there any way to turn on extra debug info in bitshuffle to help diagnose why it fails, and/or can bitshuffle skip over 'bad' chunks?