Open expensne opened 11 months ago
Interesting! I don't have time to look into this right now, but! Is the "length of data" field probably correct?
And! If want to use bioread with very big files, I think you'll want to use the streaming API for this -- see Reader.stream() -- see
https://github.com/uwmadison-chm/bioread/blob/main/bioread/runners/acq2hdf5.py#L163
for an example of its usage.
Really, though, I might convert the files to HDF5 (assuming that works properly 🤞) and use an HDF5 library (which is probably going to be better than bioread in a lot of ways) for reading the data in your code.
Is the "length of data" field probably correct?
Yes.
Really, though, I might convert the files to HDF5 (assuming that works properly 🤞) and use an HDF5 library (which is probably going to be better than bioread in a lot of ways) for reading the data in your code.
Right, to transform it to HDF5 first sounds like a good idea. I'll do that!
Let me know if it works; it may not! This code is, um, not well-tested on large inputs. But my guess is that something is going horribly wrong when trying to read the whole thing into memory and maybe streaming it into another data structure will help.
Description
If using
bioread.read_file()
on large .acq files, the content of the channels (channel.data
) is either wrong or 0.Example
I wrote a little script that just outputs some statistics of each channel. Reading a 7h .acq measurement:
Output on Mac:
Output on Windows:
I also tested it with even longer measurements. It produces always the above issue.
Error
No error is shown, it just drops the data it seems.
Env
Bioread version 3.0.1 Tested with Python 3.8, 3.9, 3.10, 3.11. Tested it on 3 different Windows machines (all Win 10) with 16GB RAM.
Notes
I noticed that the RAM usage goes rapidly up to 100% and then down again. Maybe here is the issue.
Script used
Full test script can be found here: https://github.com/expensne/bioread_test/
And .acq test files here: https://owncloud.fraunhofer.de/index.php/s/ukLl0x34UkYm3Or
1h.acq
is working fine.7h.acq
is producing the above output.