Handling Multiple Streams

bhilburn commented 7 years ago

I was chatting with @jgaeddert about liquid-dsp and SigMF, and he brought up a really great point.

Right now, we don't have a good mechanism to deal with multiple time-aligned data streams in a SigMF recording. Since a dataset file is a single sample set, and a metadata file only refers to a single dataset file, the only way to do this currently, I think, would be to effectively duplicate the metadata file for the second dataset. Especially since many applications rely on the multiple streams being processed together, this is cumbersome.

It seems like there are two obvious options:

We store both streams in the same dataset file, and make it possible for capture and annotation segments to point to sample indices in both streams (would need to work out how to delineate file position vs sample index, etc.,)
The two streams get stored in separate dataset files and we make it possible for the metadata file to refer to multiple dataset files as long as they are part of the same recording.

My preference is #2, as I think it will result in a SigMF recording structure that is easier to navigate and process. Thoughts from others?

kpreid commented 7 years ago

Taking a narrow perspective for the sake of argument: the simplest way to input time-aligned samples into a GNU Radio application would be to supply them interleaved (allowing for vectorized processing and not having to open many files).

I object to option 2 because I object to containing pathnames in metadata at all as described in #16. But there could be some systematic way of naming the files, e.g. foo.meta foo.1.data foo.2.data.

bhilburn commented 7 years ago

@kpreid - You raise a really excellent point about interleaving and vectorized processing, and you're right, that would be really easy in GNU Radio.

In the interest of getting some non-GNU Radio opinions, let's ping a few folks: @jgaeddert @miek - Do you have any thoughts on this topic, by chance?

miek commented 7 years ago

This is pretty relevant to me right now as I've been experimenting with making SDR hardware capture logic analyser data along with the usual RF samples, so I'd really like to have support for multiple streams.

So far I've been interleaving samples as @kpreid suggested, and I think either that or option 2 make the most sense for efficient recorders. For reading files, any option works for me (for inspectrum).

Another wrinkle worth thinking about is whether the multiple streams can have different datatypes (which I'd want for my use-case above). This would complicate the processing of an interleaved stream slightly, as well as running into alignment issues.

kpreid commented 7 years ago

(I recommend not opening the can of worms which is structure field ordering/padding schemes.)

bastibl commented 7 years ago

I'm not sure what kind of streams you have in mind, but if the streams come from independent clocks, the interleaved approach might lead to problems for longer captures.

smunaut commented 7 years ago

Separate files is the best approach.

One of the objective of SigMF when originally discussed was to allow easy inter-operation with non-sigmf tools that just deal with raw data and any kind of interleaving makes that impossible.

bhilburn commented 7 years ago

Okay, it sounds like we have agreement that multiple datafiles is the best approach, per points made by @miek, @bastibl, and @smunaut.

Now we have the question of do we want to have multiple metadata files (current scenario) or allow one metadata file to point to multiple datafiles?

So, questions about the one-to-many approach: 1) How do we point one metadata file at multiple datafiles? As @kpreid points out specifying datapaths in problematic, and we are likely going to remove that, anyway. @kpreid had an interesting suggestion in doing it by filename (which is relevant to the discussion happening in #14) 2) We will also need a way to indicate which channels an annotation segment applies to. Perhaps we need a core:channel pair, here?

And, actually, #2 could apply to the capture segments, as well. A common scenario would be one channel experiencing an underrun while the other does not, thus requiring a new capture segment as the timestamp needs to be updated.

kpreid commented 7 years ago

On filenames: We'll want to be able to associate some metadata (if nothing else, a human-readable name) with each stream/channel. This would presumably be something like "core:channels": [{a channel metadata object}, {a channel metadata object}]. We can specify that if core:channels exists, the data should be looked for in recordingname.0.sigmfdata, recordingname.1.sigmfdata, and so on, whereas if it does not exist it is looked for in recordingname.sigmfdata. (This avoids needing any searching for different possible filenames.)

I note that some recordings may have imprecise sample alignment and timing (e.g. someone has a bucketful of RTL-SDRs with independent clocks and command timing), so that the same capture metadata cannot be applied to all of them because the sample indexes do not match. If supporting that is in scope, then perhaps the right answer is that in a multifile dataset, the global object does not have core:capture at all, but the "channel metadata object" does.

…but supporting that meaningfully implies having a notion of a shared timeline which is wall-time-based rather than sample-index-based, which the format so far just doesn't support. Should this be declared out of scope, so that if you have a non-aligned dataset you store them as separate recordings?

bhilburn commented 7 years ago

Excellent points, @kpreid.

Okay, so based on the discussion above, it sounds like the best path forward is for separate streams to exist as separate recordings, and for reader applications to simply read in both recordings. We have the option to distribute the separate recordings together (discussion in #50), but they fundamentally two distinct metadata/dataset pairs.

Let me know if you agree / disagree with this approach, all. If so, it may be worth adding a note somehow in the spec stating this.

@jgaeddert @smunaut @kpreid @miek @bastibl @bastibl

bhilburn commented 7 years ago

Okay, seems like this issue is settled. Multiple streams will be recorded in different files, but be part of the same SigMF Recording per the development in #50. Closing this discussion to continue the work, there.

sigmf / SigMF

Handling Multiple Streams #19