sigmf / sigmf-python

Easily interact with Signal Metadata Format (SigMF) recordings.
https://sigmf.org
GNU Lesser General Public License v3.0
43 stars 16 forks source link

Running out of local storage space while creating SigMF archive on external SSD #71

Open bhorsfield opened 6 days ago

bhorsfield commented 6 days ago

Hi Folks,

I have run into a problem when trying to convert a large IQ recording into a SigMF archive.

The problem starts when I call the archive() method. When the sigmf-data file is very large (10-20 GB), the local storage on my small computing device fills up until there is no space left, at which point my Python script crashes. This occurs even if the target location for the archive is on an external SSD.

My suspicion is that the tar utility that generates the SigMF archive is creating a temp file on the local drive. Unfortunately my local drive is an eMMC card with only 5GB of free storage, which greatly limits the size of the recordings I can archive.

Details of my system configuration are as follows:

Can anyone suggest a workaround for this problem?

Thanks, Brendan.

gmabey commented 5 days ago

Yes, the current implementation of this is extremely inefficient and wasteful. I had hoped to someday rewrite it using the tarfile module but instead I ended up writing C++ classes that do it instead. That's what I currently use in my day job, and I'm hoping to release that code "soon". It seems to me that you appreciate working with archives over loose files, just like me. Do you have any inclination towards contributing to this project and taking on a rewrite of that functionality? If so, I would be happy to give you suggestions and pointers along the way.

The basic idea is to write directly into a tarball instead of copying the files into a temporary directory first. I would suggest working to make sure that the tarfile.PAX_FORMAT variation is always used.

bhorsfield commented 3 days ago

It seems to me that you appreciate working with archives over loose files, just like me.

Yes, this is usually my preference, especially in cases where the end user must manually transfer recordings from one device to another, or upload them to a cloud storage drive. With multiple loose files there is higher risk of a file getting overlooked or misplaced during the transfer process.

Do you have any inclination towards contributing to this project and taking on a rewrite of that functionality?

Sure, I would be happy to make a contribution if I can. I am not a qualified SW engineer (my background is mostly in RF engineering), but I've been writing software on and off for many years as part of my job, so I should be able to make at least some progress on this task. If you have any tips to get me started, please let me know. Otherwise, I will start by familiarising myself with the tarfile module and take it from there.

Regarding the development timeframe, I should be able to devote some time to this towards the end of September. The SigMF archive feature directly affects my current project, so I can justify a few days of full-time effort.

Cheers, Brendan.