Closed ngayraud closed 5 years ago
Yeah, that would be nice to have.
So here is the action plan as discussed with @larsoner :
make_ad_hoc_cov
will be modified to accept a dict of standard deviations for each type of sensor (mag, grad, eeg) to allow for custom creation of noise covariance matrix.simulate_raw
will be modified to accept info passed as raw (requiring a value for the new n_times=None kwarg). (#5178)make simulate_raw
so that it accepts a Forward object as input. (#5178)mne.simulation.add_noise(inst, cov, iir_filter)
to add noise to raw/epochs/evoked (#5859)raw-as-info
support, with extra duration
option (#5869)simulate_raw
accept a generator (EDIT: or iterable) for stc (#6037)SourceSimulator
class will be added that takes a source space and a set of vertices (or set of labels), and simulates signals for those vertices/labels with commonly used characteristics/patterns (e.g., N100-like, P300-like, etc.). It would provide a iter method that returns STCs corresponding to epochs, and thus an instance can be passed to simulate_raw(..., stc=my_source_estimator). This will be disussed in a separate issue. (#5924)I will update with info and reference this issue along the way! Feel free to share your thoughts about this issue!
A SourceSimulator class will be added that takes a source space and a set of vertices (or set of labels)
Will we provide defaults? I'd like for EEG people to run this stuff without bothering too much with the source code.
(e.g., N100-like, P300-like, etc.)
How about eye blinks?
Will we provide defaults? I'd like for EEG people to run this stuff without bothering too much with the source code.
We'll have to see how simple we can get the API
How about eye blinks?
simulate_raw
already does this
simulate_raw already does this
Ok, but wouldn't it make sense to unify things?
This plan is already unified. The SourceSimulator
instance just generates brain source activations, and these will be passed to simulate_raw
... you mean you'd like to see SourceSimulator
also take care of choosing when blinks and ECG should happen?
If it makes sense code and API wise, yes.
(That wasn't my original idea, but since you're asking ...)
Removing milestone since this does not seem to be a blocker
@makkostya Just a reminder that we already opened this issue here. Last checkbox is what we discussed (making the Simulation an object).
cc @sdeslauriers @mclerc @papadop
Ok, let's keep this issue as a principal one for simulation module discussion. Every time when someone creates an issue or PR related to the simulation module, please trace it in this issue. It will allow us to notify everyone concerned without pinging them.
ok
did you start collecting pieces of code to consolidate?
Here is the issue I opened for validation of the inverse problem solution: #5680
thanks. I answered.
Following up on your conversation can you centralize the code of the different folks to see what can be consolidated and integrated into MNE?
I'll discuss in details with Christos (cc @christospapa, @cgbenar) their code and after that I'll have an idea how to start its integration to MNE.
Sam and Ivana will work on the idea of adding connectivity information to MNE simulation tool, but there is naturally any code yet.
Nathalie already has a contribution pipeline to follow. I keep working on inverse problem validation.
cc @ikojcic
Here is the toy script that I think we could have for raw/epochs/evoked simulation. Change the format to ".ipynb" and open with jupyter. Please look at it and give your feedback here. The main points to discuss are:
cool you resurrect this discussion
can you put the .ipynb online to we see it rendered (not just as a txt file)
you can drop it on gist.github.com for example
thanks a lot
Here is the link to the rendered file: https://gist.github.com/makkostya/f2a15cf0203ca03e0dab6aa91621a821
I like the overall idea. Here are some specific thoughts:
simulate_stc_data
function will be closely tied to your brain model. i.e. spatial and temporal waveforms, structural connectivity, ARMA models, etc... Can we think of a few building block functions that can be combined to produce varied outputs instead of a monolithic function?stc
with the noise object is supported via __add__
, add_noise_to_raw
becomes redundant. This is a more philosophical question on syntactic sugar.stc
is supported, then complex data can be generated by summing many simple stc
. This simplifies the inputs to simulate_stc
because it only need to handle one case of [whatever simulation model].Here are other comments (and comments to comments):
To summarize, I think it would be nice (I suspect it already exists) to have a SignalPropertiy (or similar) object containg (tmin,tmax,tstep/samplerate) and have it attached to every signal... This would avoid passing tmin, tmax, tstep to several functions and would prevent the user to add two signals sampled at different rates. This would also naturally be one of the parameters to simulate_noise... I think this would add coherence to the framework...
Having said that, this comment has been written without a close view of what is currently existing in mne, and how much backward compatibility with the existing stuff do we want... This is just my (current) view of an ideal "world". ;-)
this comment has been written without a close view of what is currently existing in mne, and how much backward compatibility with the existing stuff do we want
For context, there is an API plan comment above that gives the current plan (from what I understand).
I think it would be nice (I suspect it already exists) to have a SignalPropertiy (or similar) object containg (tmin,tmax,tstep/samplerate) and have it attached to every signal
Unfortunately this is quite a departure from how things work everywhere else in MNE, so probably won't fly. If we were designing things from scratch we might have added such things.
Allow fwd object as a default input, but keep possibility to compute fwd inside the function
This was already done in #5178, I think.
Additive noise is generated outside the function.
There is already a way to have additive noise added to the simulated data via the noise_cov
property. This covers 90% of use cases, so other ways of adding noise I think should be left to the user to manually add to the raw
instance that they obtain.
new functions "simulate_roi", "simulate_stc_data", "simulate_noise".
Take a look at the proposal comment above -- I think the only modifications to simulate_raw
necessary to support these functions is allowing stc
to be a generator. A separable issue is then what source estimates you provide as inputs (in generator form). As @sdeslauriers points out:
The simulate_stc_data function will be closely tied to your brain model. i.e. spatial and temporal waveforms, structural connectivity, ARMA models, etc...
We'll have to decide what to actually include in MNE -- some ideas are in the comment I linked to above. We shouldn't include everything, but:
simulate_raw
to take a stc
generator should do it, I think.@makkostya https://github.com/makkostya or @sdeslauriers https://github.com/sdeslauriers can write a usage snippet example to demo the API you have in mind?
To converge on API i prefer to discuss on code snippets as they make clear what we have in mind.
@makkostya https://github.com/makkostya do you have the code snippet we drafted together at biomag?
Basically we need to have as input:
I like the idea that we simulate raw and then we have standard mne functions to have Epochs and Evoked
Basically we need to have as input:
- an info
- a function to produce brain time series using labels or not and with event or not
- a way to compute forward
- a way to add noise
We already have all of these in simulate_raw
except the function to produce brain time series. This is exactly what the proposal to allow stc
to be a generator would provide. Then the issue of how to produce source time series of interest. So we almost already have the necessary API.
then let's make a concrete proposition to adjust the API. How would the diff look like?
def simulate_raw(raw, stc, ...)
...
stc : instance of SourceEstimate
would become
...
stc : instance of SourceEstimate | generator
i.e., you can pass in a generator, and simulate_raw
will iterate over those to produce the outputs, rather than looping through a single SourceEstimate
(as it does now, which is useful only really for static ERP-style responses).
(this is what is meant in the comment above)
This would cope with events and event_id too?
simulate_raw
(this is in the API doc) outputs events in the stim channel, the same for each event. You can recode these arbitrarily afterward, knowing your STC input. Or overwrite the stim
channel yourself.
We could add an event_id_func
or something if we don't just want the event IDs to work the way they do now.
To clarify, it's only the same (1) if there is no head_pos
, so the default behavior is to set the event number to the head position (number) that was used.
Events coded with the position number (starting at 1) will be stored in the trigger channel (if available) at times corresponding to t=0 in the stc.
Again this can be changed easily afterward (esp. after find_events
), but if this seems too burdensome we can add an event_id_func
to the API with some suitable args, and make it backward compatible.
Maybe the thing to do is have people read about simulate_raw
, try using it, and see if the API we proposed some months ago still makes sense?
@makkostya and @sdeslauriers what is it you cannot do currently with simulate_raw or with what @larsoner proposes using a generator?
@ikojcic, @makkostya and I discussed this. It is not clear for us how the generator (or just an iterable for that matter) would improve the situation. We are guessing every iteration would return a source time course? If that is the case, just create an stc
with from the output and call simulate_raw
normally.
In our minds, the simulation process has 3 steps: 1 - Generation and summation of different source space signals with events and the usual meta data. 2 - Projection of the source space data into sensor space data (with head movement if desired). 3 - Generation and addition of noise to the measurement.
This has a direct link to the formulation of the forward problem which we think will be familiar to most users.
The current version of simulate_raw
covers part of 2 and 3 which makes the function more complicated than it needs to be. In addition, from what we got from reading the code, it is also very strict in how it handles the timing of events (the stc
is repeated until the time runs out).
Here is an updated snippet of @makkostya that separates the steps above. https://gist.github.com/sdeslauriers/9f856925797108a8ee57d33861a8d39e
We are guessing every iteration would return a source time course? If that is the case, just create an stc with from the output and call simulate_raw normally.
Yes in principle you could do this, but if your recording is long it's not going to be feasible to do so due to memory constraints if you have a lot of sources.
The other way it's conceptually different is that events in the STIM channel are tied to the STC onsets.
it is also very strict in how it handles the timing of events (the stc is repeated until the time runs out).
I don't see it as being too difficult to overwrite the STIM channel, or add metadata afterward.
makes the function more complicated than it needs to be.
Is this a problem for you in using it? I'd rather not add new ways of doing the same things without a good justification. The internals being complicated -- something a user shouldn't have to deal with -- is not a good reason to write an entirely new user-facing API.
it is also very strict in how it handles the timing of events (the stc is repeated until the time runs out).
This is what would be relaxed by allowing stc
to be a generator, so I don't see this as a limitation.
I'm sure it's possible to separate the steps, but I'm not convinced we should completely rewrite the API at this point.
What do you actually need to do in practice that you can't do with the current API?
... in case it's not clear, basically I agree with you that there is a better potential split between these steps. If we were designing something from scratch it would make sense to do it. But we have an existing function to take care of these things, and it's better not to rewrite it -- or worse, add a second way of accomplishing the same thing -- unless really necessary. So I'd like to see if it's possible to improve what we have to get what people need, rather than start over from scratch.
Yes in principle you could do this, but if your recording is long it's not going to be feasible to do so due to memory constraints if you have a lot of sources.
The other way it's conceptually different is that events in the STIM channel are tied to the STC onsets.
Then I would suggest to use the more general case. If they fit in memory, any iterable will work. If it doesn't use a generator, which is iterable.
I don't see it as being too difficult to overwrite the STIM channel, or add metadata afterwards.
That is because you are familiar with the internals. I think to most users it would be non trivial and error prone.
Is this a problem for you in using it? I'd rather not add new ways of doing the same things without a good justification. The internals being complicated -- something a user shouldn't have to deal with -- is not a good reason to write an entirely new user-facing API.
I disagree. You currently have many people that are willing to work on this and improve the API, it may be a good time to consider breaking changes.
What do you actually need to do in practice that you can't do with the current API?
I do not think that is a fair question. Using generators you are calling arbitrary code, so of course we could simulate anything. I think a better question is whether simple simulations are simple to implement or require the use of generators. I believe our suggestions simplify many common use cases and will make easier for new users.
For example, it is not trivial to simulate events that are not distributed uniformly or using multiple waveforms. It would require using generators and hacking the stim channel.
If we could implement our suggestion while preserving the current interface i.e. make the new interface backwards compatible, would that be better?
Then I would suggest to use the more general case. If they fit in memory, any iterable will work. If it doesn't use a generator, which is iterable.
Agreed stc
should be any iterable object, and not restricted specifically to a generator.
I think a better question is whether simple simulations are simple to implement or require the use of generators. I believe our suggestions simplify many common use cases and will make easier for new users.
Sure, but it's not the only question we need to consider. For example, I work with people who don't use simulate_raw
precisely because it does not have generator/iterable-like support. If it did, they would. Fortunately, in this case I think we can have our cake and eat it to...
For example, it is not trivial to simulate events that are not distributed uniformly or using multiple waveforms. It would require using generators and hacking the stim channel.
...because the SourceSimulator class mentioned in the API proposal above sounds like what you are really asking for here. It would integrate with the stc-allowed-to-be-generator case just fine. The simple-use-case code would be:
stc_gen = SourceSimulator(...<simple interface here>...)
raw = mne.simulation.simulate_raw(raw, stc_gen, ...)
This seems like a straightforward way to provide a simple interface to the user, while still allowing for advanced use cases.
I don't see it as being too difficult to overwrite the STIM channel, or add metadata afterwards.
That is because you are familiar with the internals. I think to most users it would be non trivial and error prone.
Let's find a way to extend the existing API somehow to allow for events to be supplied, too. For example we could have new event_id_func
kwarg, or allow stc
to be a tuple of (SourceEstimate, ndarray of shape (n_times,))
where the latter is what should be put in the STIM channel. Something like this is quite general, and again is something we can build into what gets returned by the use-case-simplifying SourceSimulator
class.
If we could implement our suggestion while preserving the current interface i.e. make the new interface backwards compatible, would that be better?
Yes preserving existing functionality and API would be best, it's basically what I'm pushing for here.
If we agree to keep the API backward compatible, the question is becomes whether or not we should completely rewrite the internals of simulate_raw
. My answer would be sure, refactoring is fine as long as you are able to preserve the current advantages of the existing system. But that's a challenge -- some of these advantages might not be obvious from looking at the code. For example, much of the complications come from 1) being memory-conscious, 2) allowing for multiple head positions, 3) smoothly interpolating between head positions, and 4) allowing multiple types of noise / needing many forward operators. If you can keep these things and make the internals simpler, that's a welcome change! But it sounds like a lot of work for little gain.
FWIW, the stc-as-generator is something that I am happy to implement in the next week or two. That way people who are motivated and eager to contribute could work on things that are more interesting (diversity of simulated signals in the SourceEstimator class!) than refactoring a bunch of code.
The question is, will the following MNE get you what you need as users:
would something like this fly?
def stc_generator(event_id, src=src, labels=labels):
if event_id == 1:
# this may need the knowledge of sfreq from raw
...
return stc_data_evoked_response1
elif event_id == 2:
...
return stc_data_evoked_response2
else:
raise ValueError('boom!')
events1 = mne.make_fixed_length_events(...) # to fix as it needs raw :(
events2 = mne.make_fixed_length_events(...) # to fix as it needs raw :(
events = np.r_[events1, events2]
# Generate raw with evoked activity for each condition (1 or 2)
raw_sim = simulate_raw(raw, stc_generator, trans_fname, src, bem_fname,
events, ecg=True, blink=True, n_jobs=1, verbose=True)
# then add sensor noise
raw_sim = mne.simulation.add_noise(raw_sim, iir_filter=[0.2, -0.2, 0.04], cov='simple')
we pass a callable as stc parameter that can produce stcs on the fly and specifically for each event_id value.
I also see a point in allowing to add noise in a second stage to have more flexibility.
It would work. However, the API is more complicated than having SourceSimulator
, or something else, keep track of what source space signals should be created. In your code, you pass in events
to simulate_raw
, which in turn controls what stc_generator
does via some inter-function API. Answering the question "what sources will be simulated" one has to think about both stc_generator
and how it interacts with simulate_raw
, since now it's a two-way interaction.
It seems better to keep control of source signals outside of simulate_raw
, keeping stc
as an "input only" to simulate_raw
. Your code could be easily refactored to do this given the current API proposal:
def my_stc_generator()
for event in events:
if event_id == 1:
...
yield SourceEstimate(...), stim_channel_array
elif event_id == 2:
...
raw_sim = simulate_raw(raw, my_stc_generator(), ...)
This also allows for control over where the event is inserted in the stim_channel_array
, etc.
And if we have a good SourceSimulator
class, it could be as simple as something like the following (subject to many API decisions about SourceSimulator, but no additional ones about simulate_raw
):
source_sim = SourceSimulator(...)
stc_gen = source_sim.stcs_from_events(events)
raw_sim = simulate_raw(raw, stc_gen, ...)
I also see a point in allowing to add noise in a second stage to have more flexibility.
I guess you could want different SNR values and it's backward compatible, so why not.
ok we're making progress I think. Now how does the generator know about the duration/length of the stim_channel array it needs to return?
rather than having raw as input, would having info and duration in seconds be better?
what would be parameters of SourceSimulator?
Now how does the generator know about the duration/length of the stim_channel array it needs to return?
It's returning a SourceEstimate
on each iteration, so the array is of shape (len(stc.times),)
, and for simplicity we could also permit a single int
which would be equivalent to:
stim_channel_array = np.zeros(len(stc.times), int)
stim_channel_array[0] = int_value
rather than having raw as input, would having info and duration in seconds be better?
We can, but it adds one more parameter to the function. The advantage I guess is that it saves the user from having to do this:
data = np.zeros((len(info['ch_names']), int(round(raw.info['sfreq'] * duration)))
raw = mne.io.RawArray(data, info)
That does not seem like a lot to ask of the user, but I'm okay with it if people think it's worthwhile because we can just add these lines in a conditional in simulate_raw
. (If the user supplies raw
as an instance of Raw
, then duration
should be None
; if raw
is an instance of Info
then duration
should not be None
.
what would be parameters of SourceSimulator?
That is the "make things simple" part that I think other people actually want to work on (right?). I have no idea what this API should look like so I'm happy to let others work it out :)
I think where is my confusion. For me the generator generates a signal long as an epoch and not the full duration of raw. If you do the former than the generator does not need to know about events but just event_id and if the latter it needs to know events and the full duration of the data.
does it help?
For me the generator generates a signal long as an epoch and not the full duration of raw.
Yes this is probably how most people will use it, and what SourceSimulator
will do
If you do the former than the generator does not need to know about events but just event_id
Something has to tell simulate_raw
what to put in the stim channel to match what it's putting in as sources/activations. I think it should be the job of the SourceSimulator
to provide this information (or live with simulate_raw
putting an event at the onset of the "epoch", as it does now / is the default behavior).
For me the generator generates a signal long as an epoch and not the full duration of raw
The question is what "epoch" means for you? Maybe I misunderstand it, but for me it's related to events, i.e. based on the event_id the raw data can be decomposed to epochs in many different ways. Thus I think that generator should provide a source space signal of full duration of raw. In my mind the simulate_raw should only apply forward operator in a proper way, create stim_channel from input events, but not change the signal.
The question is what "epoch" means for you?
The great thing about the existing API proposal is that it allows the user to decide ...
Thus I think that generator should provide a source space signal of full duration of raw. In my mind the simulate_raw should only apply forward operator in a proper way, create stim_channel from input events, but not change the signal.
Either the (short-)"epoch"-based or full-duration interpretation can be easily implemented, as preferred by the user or designer of SourceSimulator
, if the API allows stc
to be an interable/generator of either stc
, or (stc, event_array)
tuple.
In my mind the simulate_raw should only apply forward operator in a proper way, create stim_channel from input events, but not change the signal.
Agreed, there is no proposal here to change the (source) signal.
We do have to decide at some point what SourceSimulator
should output. To me it makes sense for it to output one stc per event as a generator. Conceptually it's the same as outputting a single STC (i.e., equivalent to some concatenation of [stc in stc_gen]
) but it has advantages under the hood memory-wise. And implementation-wise it's as simple as the def my_stc_generator()
above.
But in the meantime can we agree that stc
in simulate_raw
being allowed to be an iterable, and either stc
, (stc, int)
, or (stc, ndarary)
tuples will allow us to do everything we want? If so then I can get working on the stc-as-iterable implementation, and the discussion can shift to what SourceSimulator
should do.
i.e. can we move forward with the plan in https://github.com/mne-tools/mne-python/issues/5058#issuecomment-377500991 ?
I think that generator should provide a source space signal of full duration of raw.
I would be enclined to say the opposite. A full stc for long recordings can be a huge array and for me you simulate a brain response based on a condition ie an event_id. "Tell me what is the event_id and I will tell you how the brain will respond" is for me what a SourceSimulator would do.
But in the meantime can we agree that stc in simulate_raw being allowed to be an iterable, and either stc, (stc, int), or (stc, ndarary) tuples will allow us to do everything we want? If so then I can get working on the stc-as-iterable implementation, and the discussion can shift to what SourceSimulator should do.
I agree that stc in simulate_raw can be allowed to be an iterable.
So for each event_id:
The whole raw signal is thus the sum of raws of each stc.
If it is supposed to work like this, for the moment I see only one problem - it enforces the signal of interest to be exactly the same over the epochs. I suppose that in practice, we would also like to have some variability of SOI's amplitude or shift, but I don't really see how it can be done with this API.
SourceSimulator provides (stc[event_id], event_array[event_id]),
SourceSimulator
outputs what simulate_raw
wants, which is one of:
stc
alone: simulate_raw
will insert a 1
in the stim channel at the start of each segment(stc, my_int)
: simulate_raw
will insert the my_int
in the stim channel at the start of each segment(stc, stim_array)
: simulate_raw
will insert the stim_array
in the stim channel during same time span that the fwd @ stc
is inserted in the M/EEG channels.I'm calling it stim_array
to make it clear that we're talking about deltas in a stim channel, rather than event_id
or events
arrays (though you'd typically eventually do events = find_events(raw)
, which looks for deltas in the stim_array
). Note that (1) and (2) are just convenience aliases for (3) where:
event_array = np.zeros(len(stc.times), int)
event_array[0] = my_int
This third option allows SourceSimulator
to set the event time relative to the waveform times however it wants to achieve the desired result.
simulate_raw applies forward to stc[event_id].data, then puts it into the raw data at places defined by
stim_array
.
Not quite. The result of forward application will happen exactly at the sample numbers 0:len(stc.times)
on the first iteration; and the stim_array
will similarly show up in the stim channel at the same channel numbers. You'll end up with raw.get_data()
that looks like:
------------------------------------------------
MEG | fwd @ stc_0.data | fwd @ stc_1.data | ...
------------------------------------------------
STIM | stim_array_0 | stim_array_1 | ...
------------------------------------------------
time ->
i.e., you supply the stim channel data that matches the source data however you want such that eventually doing find_events(raw)
will give you the events
you want in your analysis.
For example let's say we designed the interface to look like this (and this is not a good design!) just for the sake of discussing the SourceGenerator
<->simulate_raw
interaction:
stc_gen = SourceSimulator(region='right auditory', response='n100', jitter=0.01)
i.e., we want a right auditory N100 with 10 ms jitter (relative to our event in the stim channel) for each epoch.
The stc_gen
could:
stc
(or (stc, 1)
) where on each iteration the waveform in stc
has been jittered/shifted in time, or(stc, stim_array)
, where the stc is the identical on every iteration, but the delta that appears in the stim_array
was jittered in time. Either way, the resulting raw = simulate_raw(...)
instance would have auditory N100s jittered by 10 ms.Either way, the result when you go to analyze is source waveforms that are jittered relative to the events you get with find_events
, which is what we want.
In this scheme you can arbitrarily change the relationships between event timing, event number, source waveforms, which sources are active, etc. And all of this complexity can be determined/encapsulated at the level of SourceSimulator
.
I am opening this issue so that we can discuss about setting up EEG/MEG simulations within MNE. My understanding is that there is a need for a more sophisticated way of simulating EEG and MEG data. @makkostya and I work in the same team, and he informed me about the discussions with @massich @mathurinm regarding that subject during his visit in Paris.
From my side, I have an immediate need for this functionality, so I already added code in my own branch of mne that will allow me to perform my simulations. Namely, I need to simulate the EEG signal that would be generated during a BCI P300 speller experiment. In particular, I was interested in being able to (a) simulate specific waveforms that occur according to some events (stimuli), (b) control the noise level, and (c) specify the location of the sources that generate the activity.
Here are 3 figures of some data I simulated using this code:
Raw signal of a p300 experiment
Evoked response of target stimulus
Evoked response of nontarget stimulus
I have made a pull request where I explain the code in more detail. N.