ratal / mdfreader

Read Measurement Data Format (MDF) versions 3.x and 4.x file formats in python
Other
169 stars 73 forks source link

[Improvement] - unzip to RAM instead of disk #202

Open darth3PO opened 2 years ago

darth3PO commented 2 years ago

Python version

3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]

Platform information

Windows-10-10.0.18362-SP0

Numpy version

1.20.1

mdfreader version

4.1

Description

https://github.com/ratal/mdfreader/blob/d1822ee4aa2b466ef0412756bef47ebd5a840dc3/mdfreader/mdf.py#L696

Passing in a zipped .dat file to Mdf like yop = mdfreader.Mdf(file_name='DatFile.zip') will result in the .zip file being extracted to my working directory. Is it possible to extract the zip into RAM instead of SSD/HDD?

When using the multiprocessing library, the bottleneck becomes SSD read/write speed. Wondering if this can be sped up by just using RAM instead.

I'm not sure if zipfile.ZipFile.read() or .open() would work? Some say that io.BytesIO would also do the trick. Most solutions for 'unzip to RAM' assume that we are requesting the file over the internet, but the zip is local. When extracted, the contents would fit in RAM.

Thanks

ratal commented 2 years ago

Thanks for the idea, could be investigated. ZipFile allows read() and seek() so it could read the file transparently while decompressing it but I do not think it loads the complete file into memory. In the end, if there is a lot of pointer travel in the file (can happen for reading block that could be a bit everywhere), it could lead to performance penalty while keeping memory impact. I guess should be benchmarked. BytesIO could load in memory it seems but I am wondering if it is appropriate for all use cases -> If going in this direction, I would recommend to make it optional.

darth3PO commented 2 years ago

Thanks for your thoughts. I will try to learn more about BytesIO and see if I can implement something.