sigmf / SigMF

The Signal Metadata Format Specification
Creative Commons Attribution Share Alike 4.0 International
364 stars 75 forks source link

PSD support #221

Open nsbruce opened 2 years ago

nsbruce commented 2 years ago

Hello from the radio astronomy / RFI monitoring world.

Is (integrated) PSD support on the SigMF roadmap? Is there a way for me/my group to help push it along?

Cheers!

gmabey commented 2 years ago

Hi @nsbruce

Just trying to restate your question here to make sure I understand. Are you asking whether there are plans to modify the SigMF file format to store frequency-domain data?

Or, are you trying to generate and display PSD computations from SigMF-formatted data?

Assuming your question is the first one, I'm going to draw an analogy. The VITA 49.0 standard is another way to describe time-series samples, and had no (integrated) support for describing frequency domain data. It was many years and a much longer standard before frequency domain data was included in VITA 49.2. I suspect that's similar to where the SigMF project is at, IMHO. (that trailing "IMHO" prevented me from ending a sentence with a preposition BTW)

One could argue that frequency domain data could be described in a completely separate standard, although the momentum of this project would be a boon to those with an interest in describing frequency domain data.

As for me and my company, we're sticking with our horrid, ancient, proprietary file formats for frequency domain data, and adopting SigMF for time-series, for the time being. It just doesn't seem like an easy-enough delta (to me) to invest the effort in extending SigMF to that domain.

What do you think?

jacobagilbert commented 2 years ago

@nsbruce I am also assuming you mean storing PSD information as @gmabey is.

SigMF was originally designed for storing uniformly sampled time domain data, but this is not the first time the subject of storing spectral information has come up, and there are a couple solutions that you might want to consider if using SigMF, and I have played a bit with both.

The first is just to store PSDs as a sequential time series of real (or complex) values and define an extension to add appropriate interpretation metadata. While this is not great, and is a bit of an abuse of what SigMF Datasets are intended to represent, its totally functional.

The second is to store PSDs as two dimensional data by using the core:num_channels field to specify the PSD length and effectively store the data as a 2D array. This gets a little bit strange because your "sample rate" is really now a "PSD rate", which might cause confusion. If you sampled data at 1e6, computed 256 point FFTs and saved these using this format you should technically use 3906.25 as the rate which is somewhat strange. At the end of the day the Datasets ends up being exactly the same as in the first example.

I think overall SigMF needs a better way to store PSD data, and I also think there is sufficient support to enable this by the specification. This could be as simple as adding one or two global metadata fields to indicate that the data is fundamentally temporally or spectrally sampled, and probably some information on the representation (is the magnitude lin/mag^2/log scaled, etc). I am very much interested in hearing thoughts either here or on the chat.gnuradio.org #SigMF channel.

Teque5 commented 2 years ago

We've discussed the possibility of supporting frequency-domain data several times and I think the main contention is that it's quite complex and out-of-scope for SigMF as a whole.

Having said that, I process a lot of data in the frequency domain and wouldn't mind if someone hashed out the idea. Currently when I store files I typically generate the sigmf pair and also create a whatever.webp that actually contains a rendered lossy 500x500px spectrogram of the data since it's nice to scrub through lots of files like that. If we started including frequency domain data, would we also support freq magnitude?

jacobagilbert commented 2 years ago

@Teque5 adding some sort of thumbnail/preview image is definitely of interest. It's something I believe is useful for more typical time series data also. I built an extension at one point that allowed for inline base64 or file specification of this though I did not use it very much.

I think on disk representation of PSD data is fairly straightforward, there seems to be one primary way to accomplish this (n Bins, m PSDs in time: B0T0, B1T0...BnT0, B0T1, B1T1...BnT1...B0Tm, B1Tm....BnTm), but metadata is a broader question. I do think we can support PSD data of any supported type without much consequence, which is nice since that will be independent of time/freq data.

The simplest way to do this would simply be to have a meta flag that marks the data as spectral vs temporal. I addressed a few complications with this and sample rates above. As meta definitions get more complicated my ideas start to see complications, though I haven't put a ton of thought into this.

Im going to open a separate issue for thumbnail/image support so we can keep that from getting conflated into this discussion.

Teque5 commented 2 years ago

For the freq domain datastore it's possible/likely that for the computed bins a window function or overlap may have been applied which would mean the data is no longer just different in domain representation.

777arc commented 2 years ago

Remember that as soon as you produce a PSD or spectrogram, you have done analysis. SigMF recordings store the raw data, so that you can go off and do analysis and store the results however you prefer. In my opinion, there are too many params associated with generating spectrograms and PSDs that it feels funny to have it be part of SigMF. Everyone has their own preferences for how to do that stuff, like how much averaging to use, windowing, overlap, scaling, and it will depend on the application/field. Now if you are a radio astronomy person using a system that only outputs PSDs and maybe there's no notion of time domain samples, then I can see why this is important to you, SigMF is not usable without it. With SigMF we are already running into dozens of edge cases, it seems best to use KISS approach and keep the base simple (ie just uniformly sampled time domain) so that we can work through the edge cases associated with that, instead of piling on even more.

gmabey commented 2 years ago

I mostly agree with @777arc but I think that there are not too too many params related to the analysis of time-series data such that it's impossible/impractical to describe enough of them in the metadata.

There's a lot that goes on (er, can go on) between the antenna and the output of the ADC in an RF sensing world that's not perfectly described by what the spec allows, although don't get me wrong, there's a lot that can.

I also agree that the project isn't quite ready to take on this kind of extension, but that it could and should at some later date.

jacobagilbert commented 2 years ago

Its exclusion won't be decided because we don't have time for it. It may just be something intentionally not included as it does represent a deviation from the intention of the spec, which is not to explain analysis.

The hardware related acquisition details you mention are distinct in that they are not analysis.

@777arc @bhilburn thoughts on creating an extension for this? I do agree with your "this is analysis" statement though I feel this will be something people will ultimately do and it may still be beneficial to provide a standard way of doing this.

777arc commented 2 years ago

Yeah as long as we're clear about the level of support and corner case discussion that goes along with extensions

jacobagilbert commented 2 years ago

Yeah, generally our extensions don't come with the implication of tooling support. We can be more specific in the actual extension.

aromanielloNTIA commented 2 years ago

I've just come across this issue, and wanted to point folks towards a solution which may work for some use cases: NTIA's ntia-algorithm extension of the SigMF namespace. View the specification here and the overall repository, which contains additional extensions, here. These were developed for NTIA/ITS's spectrum monitoring applications, but may be useful in other fields such as radio astronomy as well. The ntia-algorithm extension provides support for annotating frequency domain detections, and may be satisfactory for @nsbruce's initial use case.

dkozel commented 2 years ago

if you are a radio astronomy person using a system that only outputs PSDs and maybe there's no notion of time domain samples

From my experience with Radio Astronomy systems, this is the case for the majority of radio telescopes. The processing pipelines heavily trend towards at least creating vectors of channelized time domain streams as the raw output and often include an integration and/or PSD step.

Several radio astronomers will be at GRCon this year and I'd recommend this as a topic to discuss in the breakout session.