Open alexrockhill opened 2 years ago
support for frequency dimension within STCs is already on the roadmap: https://mne.tools/dev/overview/roadmap.html#time-frequency-classes and work is underway on that goal in #10184 (so far only for sensor-space containers though)
...but that doesn't cover including a dimension for trial
Thanks for connecting that document and PR @drammock! What do you think about the idea that it might be assembled more modularly than a Spectrum
class for instance? The analogy is that if the data were too large, you might need to compute the psd per channel and then have the user assemble them all into a spectrum class via the method that best utilizes their RAM. Obviously, that's not the case for PSDs so the API might be different. Curious to hear what you think the best solution to this is.
if the data were too large, you might need to compute the psd per channel and then have the user assemble them all into a spectrum class via the method that best utilizes their RAM
I guess my gut reaction is that if the data is too large to generate the spectrum/spectrogram in-memory, then users should consider either (1) a lower-density source space, or (2) fewer timepoints/frequency bins (AKA downsample / decimate). If neither of those are possible, maybe the pipeline shouldn't be running on a laptop? I know it sounds a bit harsh, but some datasets/analyses are big enough to start to look like "big data" and I'm uncertain how far we should go to accommodate running such jobs on under-resourced hardware. The more magic we do under the hood, the harder that code becomes to maintain.
(side note: FFT and an MNE or dSPM inverse are all linear operations, so you can do FFT on sensors before applying inverse and get the same answer as doing inverse first and then FFT, but FFT first is much more efficient because in typical cases you're doing FFT on ~300 channels instead of ~20k vertices.)
As far as across trials, we have some support for a return_generator
kwarg (e.g., STCs from Epochs); my instinct would be to keep supporting something like that.
I disagree, you could have use cases where you want time-frequency decomposed data over 72 hours of continuous data recording sampled at 1 kHz for 100+ channels, it just isn't practically feasible yet to have this kind of data in RAM even if you have a lot of money for hardware resources. Practically, I've found when working with large datasets, it's much more efficient to loop over something like channels where each one is independent or the dependence can be done in one step such as the solving the leadfield.
As a very practical example, let's say you have a typical 2 GB file, if you were to do a TFR with 100 frequencies and trial resolution, that's 200 GB of RAM which is just really impractical relative to looping over channels and keeping the max RAM usage around 2 or 4 GB with the tradeoff of having to read/write to disk more. I think, in practice, you're often going to decimate to keep less then 200 GB on disk but you could also use the results right away rather than storing to disk which can be practical when the analysis can be done for each channel independently.
Let me chime in as well with a few assorted thoughts:
Background: We touched upon this subject coming from the visualization side (e.g., visualizing space, frequency, and time interactively for source projections), and then extended our musings towards other use-cases (e.g. trials).
Regarding visualization, an array of data would make more sense than a list/generator. One possibility could thus be, to assemble that immediately before or during the call to the viz GUI from the list/generator. This would leave any methods that return such objects alone (e.g. apply_dics
).
However, so we thought, it could also make sense to change the class such methods return - from my experience, the lists are not great to work with, need more memory than necessary (due to redundancy), and usually have to be manually transferred into an array for further usage (e.g., decoding or so) anyway.
Now, I tend to agree with @drammock that (at least for now) we could assume that the user has enough memory to do such operations. Alternatively, we could leave the methods that work across trials alone - and let them return a generator (or have the option to let them return a generator), which would fix part of this issue?
From the (GSOC) visualization perspective, we should probably just focus on making sure that we use future-proof input and I think for that @drammock can give us useful hints - so that we can also make sure that sensor and source space objects behave similarly.
However, so we thought, it could also make sense to change the class such methods return - from my experience, the lists are not great to work with, need more memory than necessary (due to redundancy), and usually have to be manually transferred into an array for further usage (e.g., decoding or so) anyway.
all good points. For the time-frequency visualization use case, there is no question in my mind that a new class is needed (it's already on the roadmap). But I'm -0.5 on also trying to tackle the across-trials interactive visualization problem at the same time.
@drammock Yes, I see your concerns and I think it'd be great to focus the effort as much as possible. It would still be great to write the visualization with already anticipating coming changes to the input (i.e. stc
) though - to make upcoming adaptions as smooth as possible.
@alexrockhill maybe the version where we have a wrapper for current state stc
to array could work - right now we only take this input, but once the new class is there, we can extend that to work with a more dimensional stc
as well.
It would still be great to write the visualization with already anticipating coming changes to the input (i.e.
stc
) though - to make upcoming adaptions as smooth as possible.
yes, definitely. I haven't yet finalized a source space TFR object's methods/properties yet, but based on existing sensor-space TFR objects you can expect .times
, .freqs
, either .data
or (more likely) .get_data(times, freqs)
, .vertices
, .shape
, .subject
, .method
(e.g. 'morlet'
), .crop(tmin, tmax, fmin, fmax)
and .save(fname)
. Data shape will likely be vertices x frequencies x times. All this is open to discussion of course, if changing something will make the viz much easier.
if the data were too large, you might need to compute the psd per channel and then have the user assemble them all into a spectrum class via the method that best utilizes their RAM
I guess my gut reaction is that if the data is too large to generate the spectrum/spectrogram in-memory, then users should consider either (1) a lower-density source space, or (2) fewer timepoints/frequency bins (AKA downsample / decimate). If neither of those are possible, maybe the pipeline shouldn't be running on a laptop? I know it sounds a bit harsh, but some datasets/analyses are big enough to start to look like "big data" and I'm uncertain how far we should go to accommodate running such jobs on under-resourced hardware. The more magic we do under the hood, the harder that code becomes to maintain.
(side note: FFT and an MNE or dSPM inverse are all linear operations, so you can do FFT on sensors before applying inverse and get the same answer as doing inverse first and then FFT, but FFT first is much more efficient because in typical cases you're doing FFT on ~300 channels instead of ~20k vertices.)
As far as across trials, we have some support for a
return_generator
kwarg (e.g., STCs from Epochs); my instinct would be to keep supporting something like that.
I thought about what you said more here @drammock, and perhaps you are right that this is a more reasonable place to start given that we put in the examples and tutorials how to concatenate over time. Whether you want to loop over channels or over chunks of data in time, I think both should be supported as I can see use cases for either (e.g. channels might be for something like sEEG where stats might be contact-level because it's difficult to compare across groups with different montages whereas an MEG/EEG analysis would probably use the same sensor set/montage and the time chunks given lots of data would be more appropriate). Ideally, both ways would be reasonable to make a source space and I think we're a ways away but maybe we can get there this summer. Thanks for the feedback, I'll try and get started on this implementation probably early next week once https://github.com/mne-tools/mne-python/pull/10777 and a followup to add the basic GUI merge.
Per conversation on Discord, it sounds like everyone is on board making an stc class that would be returned by an apply DICS
beamformer to epochs that iterates over time windows (but computes the covariance using the whole epoch per @britta-wstnr 's point) that has times at the center of each window. I think it would be better to make it 5D (3D vertices x frequencies x times) rather than a list of stc
s as @britta-wstnr suggested would be the minimal possible change since that it would be compatible with other methods e.g. LCMV
and MNE
-type methods and because it's more elegant. The one thing is that it equates a binned time window with a single central time but I don't think that's really all that misleading since there are all sorts of operations that do similar things.
I think it would be better to make it 5D (3D vertices x frequencies x times)
This would not be consistent with current STC class, where "space" is really "vertex number" and only uses 1D
Also maybe I'm just misunderstanding your text above, but it seems different from how I remember the conversation. My memory is "mock up a 3D STC class (i.e. make a one-off with whatever init signature is convenient, and give it only the methods needed for the GUI testing), then instantiate it and use it to test your WIP GUI. No need to make the mocked class actually be returned by any of our existing functions yet." Please correct me if I'm mis-remembering @britta-wstnr @agramfort
Yes. 1d axis for space with vertno for indexing a source space is what is consistent
And yes mock an object just to make the gui doable. We will see next how to adjust the public functions to return such an object. You can use a function that converts the list of list of stc to this new object for testing on real data
Yup, I agree with @drammock and @agramfort. Also that it is only in principle 5D, but a 3D object.
Talking to @britta-wstnr about the options for time-frequency resolved source estimates (and also trial-resolved source estimation ideally for forward extensibility) here is what we came up with:
Here is the problem:
stc
attributes all match every timestc
data object would be the easiest but would break backward compatibility for people using the public attributestc.data
Here is the proposed solution:
Avoiding potential issues:
We couldn't think of any other future extension dimensions that would be good to include since this proposal is to make new classes but if there are things, we could consider including them. Connectivity comes to mind but I think should be outside the main MNE-Python repository. TFR seems right in the MNE-Python main repository wheelhouse so I think this would be nice to incorporate so that the visualization GUI can take in an object with all the necessary info and the user doesn't have to cobble it together. What do people think?
cc @agramfort @larsoner @jasmainak related to GSoC