_read_datafile() calls file close() on every partial dataset read

csylvain commented 8 months ago

https://github.com/sigmf/sigmf-python/blob/c5d194d5e659def926d25737baa7b6cbbb4887bd/sigmf/sigmffile.py#L679

I have been working on a project which operates wonderfully on a 16GB RAM laptop but has to be simplified if it will ever run successfully on a 512kB RAM Raspberry Pi Zero 2 W.

I see that should a dataset be read incrementally (reducing memory footprint), rather than all-at-once (non-issue on laptop with gobs of memory), each call of read_samples() in turn calls _read_datafile() which will perform an open(), seek(), and a close().

Perhaps the file management should be promoted to read_samples() and the parent SigMFFile class? Then the dataset reads could be performed with a single open() and one concluding close() with any number of seek() and read() in between ?

This enhancement suggestion would also make possible the use of mmap() and thus place the system memory management burden on the OS rather than the Python runtime.

gmabey commented 8 months ago

SigMFArchiveReader already uses mmap(), check it out.

There definitely are gaps in the implementation of it and SigMFFile, contributions welcome.

Glen

On Fri, Mar 1, 2024 at 9:20 AM CSylvain @.***> wrote:

https://github.com/sigmf/sigmf-python/blob/c5d194d5e659def926d25737baa7b6cbbb4887bd/sigmf/sigmffile.py#L679

I have been working on a project which operates wonderfully on a 16GB RAM laptop but has to be simplified if it will ever run successfully on a 512kB RAM Raspberry Pi Zero 2 W.

I see that should a dataset be read incrementally (reducing memory footprint), rather than all-at-once (non-issue on laptop with gobs of memory), each call of read_samples() in turn calls _read_datafile() which will perform an open(), seek(), and a close().

Perhaps the file management should be promoted to read_samples() and the parent SigMFFile class? Then the dataset reads could be performed with a single open() and one concluding close() with any number of seek() and read() in between ?

This enhancement suggestion would also make possible the use of mmap() and thus place the system memory management burden on the OS rather than the Python runtime.

— Reply to this email directly, view it on GitHub https://github.com/sigmf/sigmf-python/issues/53, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVTOUA6C5G6C2SPZLJBN2TYWCTFXAVCNFSM6AAAAABECA5X6GVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE3DGNZTGQYDMOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

csylvain commented 8 months ago

I hadn't looked at SigMFArchiveReader because my dataset is just one file of pre-computed IQ samples. Thanks for calling my attention to it.

I see archive reading uses Numpy's memmap(): "NumPy’s memmap’s are array-like objects. This differs from Python’s mmap module, which uses file-like objects." Numpy documentation Python documentation However, read_samples() uses _read_datafile() which looks like it is using a file-like mechanism.

I can already report with the existing read_samples() implementation, partial reads works smoothly on the 512kB RAM device, where an all-at-once read suffers from random TX underruns.

sigmf / sigmf-python

_read_datafile() calls file close() on every partial dataset read #53