sigmf / sigmf-python

Easily interact with Signal Metadata Format (SigMF) recordings.
https://sigmf.org
GNU Lesser General Public License v3.0
42 stars 16 forks source link

Modifying the contents of an existing SigMF Archive #48

Closed bhorsfield closed 7 months ago

bhorsfield commented 7 months ago

Hi there,

Can anyone tell me the best way to modify the metadata in an existing SigMF Archive file without running afoul of the hashing check?

My current approach is working OK, but the code doesn't look as elegant as it could be, and I feel like I am missing something:

# extract dataset & metadata objects from old archive and save as 
# sigmf-data and sigmf-meta files with new names
handle = SigMFArchiveReader('old_archive')
handle[:].tofile('new_archive.sigmf-data')
handle.sigmffile.tofile('new_archive')

# import new sigmf-meta file and append additional metadata to Global object
handle = sigmffile.fromfile('new_archive')
for data in new_metadata:
    add_new_metadata(handle, data)  # helper function to update Global object

# save updated metadata object & sigmf-data file to new SigMF Archive file
handle.archive('new_archive')
os.remove('new_archive.sigmf-data')
os.remove('new_archive.sigmf-meta')

I have tried various ways to simplify this process, but the above code snippet is the only approach that seems to work without raising a hashing-related exception.

Is there a more concise way to achieve this goal?

Thanks in advance, Brendan.

gmabey commented 7 months ago

So, this isn't currently supported in the python module, but if you use the tarfile module to append a new .sigmf-meta file with the same name as the original to the .sigmf archive, then you've effectively replaced it, since a normal tar -x operation would overwrite so the last one wins. In the C++ tools I've written for handling archives, operations like this are handled that way ... but they aren't yet public.

A contribution that added this feature to the python module would be very appreciated.

Teque5 commented 7 months ago

This repository is mostly focused on read/write but could be improved as you suggest for modifying files in-place. We'll add it to the to-do list.

Unsure what you mean by "hashing related exception", but if you are referring to calculating the sha checksum you can skip it on read by using skip_checksum=True. You should be able to skip checksum on write also as mentioned in #46, but this is also not yet implemented.

bhorsfield commented 7 months ago

So, this isn't currently supported in the python module, but if you use the tarfile module to append a new .sigmf-meta file with the same name as the original to the .sigmf archive, then you've effectively replaced it, since a normal tar -x operation would overwrite so the last one wins.

Fair enough, but presumably this will cause a checksum failure when I read the modified .sigmf archive, unless I use skip_checksum=True as per @Teque5's reply above. Is this correct?

bhorsfield commented 7 months ago

Unsure what you mean by "hashing related exception", but if you are referring to calculating the sha checksum you can skip it on read by using skip_checksum=True.

Yes, I was referring to the sha checksum.

Instead of using skip_checksum=True on read, is it possible to force a recalculation of the checksum before I write the modified metadata to the .sigmf archive? This might allow me to eliminate a couple of steps from the code snippet in my original post.

gmabey commented 7 months ago

The checksum only covers the .sigmf-data file, so if you replace the .sigma-meta file and copy the checksum into the new version, then it should be fine.

On Wed, Jan 31, 2024 at 3:39 PM Brendan Horsfield @.***> wrote:

So, this isn't currently supported in the python module, but if you use the tarfile module to append a new .sigmf-meta file with the same name as the original to the .sigmf archive, then you've effectively replaced it, since a normal tar -x operation would overwrite so the last one wins.

Fair enough, but presumably this will cause a checksum failure when I read the modified .sigmf archive, unless I use skip_checksum=True as per @Teque5 https://github.com/Teque5's reply above. Is this correct?

— Reply to this email directly, view it on GitHub https://github.com/sigmf/sigmf-python/issues/48#issuecomment-1920105833, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVTOUGIAWS5NTZMKILF7ODYRLB3HAVCNFSM6AAAAABCSXI3W2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRQGEYDKOBTGM . You are receiving this because you commented.Message ID: @.***>

bhorsfield commented 7 months ago

Thanks for the help @Teque5 and @gmabey. While both of your solutions are perfectly valid, I think I will continue to use my current approach for the time being. This is because:

In any case, I am hoping I won't need to modify SigMF archives for much longer. As soon as I have integrated some external sensors into my prototype receiver, it should be possible to write my SigMF archives in one step.

Thanks again!

robgardien commented 7 months ago

Hi there,

Can anyone tell me the best way to modify the metadata in an existing SigMF Archive file without running afoul of the hashing check?

My current approach is working OK, but the code doesn't look as elegant as it could be, and I feel like I am missing something:

# extract dataset & metadata objects from old archive and save as 
# sigmf-data and sigmf-meta files with new names
handle = SigMFArchiveReader('old_archive')
handle[:].tofile('new_archive.sigmf-data')
handle.sigmffile.tofile('new_archive')

# import new sigmf-meta file and append additional metadata to Global object
handle = sigmffile.fromfile('new_archive')
for data in new_metadata:
    add_new_metadata(handle, data)  # helper function to update Global object

# save updated metadata object & sigmf-data file to new SigMF Archive file
handle.archive('new_archive')
os.remove('new_archive.sigmf-data')
os.remove('new_archive.sigmf-meta')

I have tried various ways to simplify this process, but the above code snippet is the only approach that seems to work without raising a hashing-related exception.

Is there a more concise way to achieve this goal?

Thanks in advance, Brendan.

Brendan,

Could you provide the full code you used? I'm trying to do a similar action. Thank you in advance

bhorsfield commented 7 months ago

@robgardien, below is a complete script showing how to modify an existing SigMF archive and save the results as a new archive. The script looks for an existing file called "old_archive.sigmf", which contains a standard sigmf-data file & sigmf-meta file.

Note that when I create the original sigmf-meta file, I append the line "my-namespace:my_global_array": [] to the Global dictionary. The following script then appends the new metadata to this array.

"""Script to add metadata to an existing SigMF Archive and 
save the results as a new SigMF Archive.
"""

import os
from sigmf import sigmffile, SigMFFile, SigMFArchiveReader

def add_new_metadata(handle, result):
    """Append new metadata to a global field from my extension namespace
    """
    handle._metadata[SigMFFile.GLOBAL_KEY]['my-namespace:my_global_array'].append(result)

if __name__ == '__main__':

    # file names
    current_path = os.path.dirname(os.path.abspath(__file__))
    old_archive = os.path.join(current_path, 'old_archive.sigmf')
    new_archive = os.path.join(current_path, 'new_archive')  # base filename of new .sigmf archive

    # generate some dummy metadata to add to Global object in existing SigMF Archive
    new_metadata = ['bob', 'kate', 'sue', 'michael', 'jane']

    # extract dataset & metadata objects from old archive and save as 
    # sigmf-data and sigmf-meta files with new names
    handle = SigMFArchiveReader(old_archive)
    handle[:].tofile(new_archive + '.sigmf-data')
    handle.sigmffile.tofile(new_archive)

    # import new sigmf-meta file and append additional metadata to Global object
    handle = sigmffile.fromfile(new_archive)
    for data in new_metadata:
        add_new_metadata(handle, data)  # helper function to update Global object

    # save updated metadata object & sigmf-data file to new SigMF Archive
    handle.archive(new_archive)
    os.remove(new_archive + '.sigmf-data')
    os.remove(new_archive + '.sigmf-meta')

    # verify that new SigMF Archive contains new metadata
    handle = sigmffile.fromfile(new_archive)
    gi = handle.get_global_info()
    for name in gi['my-namespace:my_global_array']:
        print(name)