Open Teque5 opened 3 months ago
Hi @Teque5, I've run into the same kind of problem with sigmf archives... I was hoping #42 was going to fix this, but no..
On my end the problem is that functions like read_samples_in_capture()
assume that we have a data_file to access to run things like os.path.getsize()
. IMO it would be really nice to rework/consolidate SigMFFile.__init__
, set_data_file()
, and _read_datafile()
to process user inputs (either a file, a buffer, any other type, ...) into a single internal representation of the data (maybe _memmap
?). Then, each accessor can use that single "representation" and return whatever is needed.
This might also help support loading a non-conforming dataset? Let me know what you think, I don't have a lot of time to spare on this, but I could try to help out.
When reading samples from signals the current implementation is a bit quirky and deviates from expectations when reading memory mapped samples from a file IF those samples need to be scaled.
Consider the case where we read the sigmf logo from the main repository. This is a 2-channel real-valued audio file with samples stored as 16-bit integers.
This happens because when using read_samples the scale factor is applied, but this is not done for the memory map.
I'm not sure the exact best solution for this, but I think we should fix #15 simultaneously since it will require tinkering with the same code.
Solutions I propose: 1) Leave as-is 2) When accesing the memory-map of a file that requires scaling, return of a copy of the data instead (by using
read_samples
probably) 3) When accessing a memory-map return a scale parameter along with the data? or maybe a warning?Fixing #15 I believe requires using the
offset
kwarg ofnp.memmap
.