sigmf / sigmf-python

Easily interact with Signal Metadata Format (SigMF) recordings.
https://sigmf.org
GNU Lesser General Public License v3.0
42 stars 16 forks source link

fix data_file in SigMFArchiveReader() #42

Closed liambeguin closed 3 months ago

liambeguin commented 8 months ago

When opening a sigmf archive with SigMFArchiveReader(), the data file is currently set to the full archive (including metadata).

This causes issues when writing the archive back to disk and invalidates the metadata hash since the data_file is now a tar archive and not just the set of samples.

To work around the issue, carry around a BytesIO buffer with the content of the data_file and write it to the tmpdir just before saving a new archive to disk.

liambeguin commented 8 months ago

I put this as a draft because I noticed the following:

file = 'QPSK_4SPS.sigmf-data'

s = sigmf.sigmffile.fromfile(file.replace('-data', '-meta'))
d1 = s.read_samples()

s = sigmf.sigmffile.fromarchive(file.replace('-data', ''))
d2 = s.read_samples()
d3 = s[:]

print(np.array_equal(d1, d2)) # False!
print(np.array_equal(d1, d3)) # True

This is also true on master, I'll try to get ready shortly.

Teque5 commented 7 months ago

Any updates on this? It was close to being ready.

liambeguin commented 5 months ago

Hi @Teque5 apologies for the delay! I haven't had time to look at this more closely. Since this was also seen on master, maybe we should open another issue to keep track of it separately. I'll rebase and change the status of the MR.

liambeguin commented 4 months ago

Hi @Teque5, thanks for pointing that out! this last update should fix the issue. here's a short range-diff:

$ git range-diff origin/main lvb/archive-fix archive-fix 
1:  c17754ca7415 = 1:  f12fea276be6 sigmffile: propagate skip_checksum when opening archives
2:  385804da7dae = 2:  676d69cb0f90 tests: archive: make sure we don't change samples
3:  4087e628f708 = 3:  72f5fe595867 sigmffile: be more specific on except statement
4:  bc7375138e62 = 4:  2b5ba795d743 sigmffile: separate data_offset and data_size
5:  747e162aac47 ! 5:  5c4c33d67a72 archive: use BytesIO to store data file
    @@ sigmf/archivereader.py
      import tempfile
     +from pathlib import Path

    - from . import __version__  #, schema, sigmf_hash, validate
    - from .sigmffile import SigMFFile
    + from . import __version__
    + from .archive import SIGMF_ARCHIVE_EXT, SIGMF_DATASET_EXT, SIGMF_METADATA_EXT, SigMFArchive
     @@ sigmf/archivereader.py: class SigMFArchiveReader():
                      elif memb.name.endswith(SIGMF_DATASET_EXT):
                          data_offset = memb.offset_data
    @@ sigmf/archivereader.py: class SigMFArchiveReader():
              )

      ## sigmf/sigmffile.py ##
    -@@ sigmf/sigmffile.py: import tempfile
    +@@ sigmf/sigmffile.py: from collections import OrderedDict
      from os import path
    - import warnings
    + 
      import numpy as np
     +import io

    - from . import __version__, schema, sigmf_hash, validate
    - from .archive import SigMFArchive, SIGMF_DATASET_EXT, SIGMF_METADATA_EXT, SIGMF_ARCHIVE_EXT, SIGMF_COLLECTION_EXT
    + from . import __specification__, __version__, schema, sigmf_hash, validate
    + from .archive import SIGMF_ARCHIVE_EXT, SIGMF_COLLECTION_EXT, SIGMF_DATASET_EXT, SIGMF_METADATA_EXT, SigMFArchive
     @@ sigmf/sigmffile.py: class SigMFFile(SigMFMetafile):
                  # check for any non-zero `header_bytes` fields in captures segments
                  if capture.get(self.HEADER_BYTES_KEY, 0):
    @@ sigmf/sigmffile.py: class SigMFFile(SigMFMetafile):
     +            fp = open(self.data_file, "rb")
     +            fp.seek(first_byte, 0)
     +            data = np.fromfile(fp, dtype=data_type_in, count=nitems)
    ++        elif self.data_buffer is not None:
    ++            data = np.frombuffer(self.data_buffer.getbuffer(), dtype=data_type_in, count=nitems)
     +        else:
     +            data = self._memmap
     +

I feel like this is a little complicated for what we're trying to do, but unfortunately I don't have a lot of time to put into simplifying this. I'll try to get back to it as soon as possible.

Teque5 commented 3 months ago

I believe you need to add import sigmf to test_archivereader to make your test pass checks.

liambeguin commented 3 months ago

I believe you need to add import sigmf to test_archivereader to make your test pass checks.

oh.. apologies for that! somehow this was passing on the branch I had locally. just pushed a fix.

Teque5 commented 3 months ago

Sorry the PySimpleGUI devs just yanked their package from pip so I'm fixing that bug on your branch as part of this PR.