Feature: Simulation of signals

ngayraud commented 6 years ago

I am opening this issue so that we can discuss about setting up EEG/MEG simulations within MNE. My understanding is that there is a need for a more sophisticated way of simulating EEG and MEG data. @makkostya and I work in the same team, and he informed me about the discussions with @massich @mathurinm regarding that subject during his visit in Paris.

From my side, I have an immediate need for this functionality, so I already added code in my own branch of mne that will allow me to perform my simulations. Namely, I need to simulate the EEG signal that would be generated during a BCI P300 speller experiment. In particular, I was interested in being able to (a) simulate specific waveforms that occur according to some events (stimuli), (b) control the noise level, and (c) specify the location of the sources that generate the activity.

Here are 3 figures of some data I simulated using this code:

Raw signal of a p300 experiment

example_high_noise

Evoked response of target stimulus

example_low_noise_target

Evoked response of nontarget stimulus

example_low_noise_nontarget

I have made a pull request where I explain the code in more detail. N.

jona-sassenhagen commented 6 years ago

Yeah, that would be nice to have.

ngayraud commented 6 years ago

So here is the action plan as discussed with @larsoner :

[x] make_ad_hoc_cov will be modified to accept a dict of standard deviations for each type of sensor (mag, grad, eeg) to allow for custom creation of noise covariance matrix.
[x] simulate_raw will be modified to accept info passed as raw (requiring a value for the new n_times=None kwarg). (#5178)
[x] A fwd=None kwarg will be added to make simulate_raw so that it accepts a Forward object as input. (#5178)
[x] EDIT: Add mne.simulation.add_noise(inst, cov, iir_filter) to add noise to raw/epochs/evoked (#5859)
[x] EDIT: Add raw-as-info support, with extra duration option (#5869)
[x] make simulate_raw accept a generator (EDIT: or iterable) for stc (#6037)
[x] A SourceSimulator class will be added that takes a source space and a set of vertices (or set of labels), and simulates signals for those vertices/labels with commonly used characteristics/patterns (e.g., N100-like, P300-like, etc.). It would provide a iter method that returns STCs corresponding to epochs, and thus an instance can be passed to simulate_raw(..., stc=my_source_estimator). This will be disussed in a separate issue. (#5924)

I will update with info and reference this issue along the way! Feel free to share your thoughts about this issue!

jona-sassenhagen commented 6 years ago

A SourceSimulator class will be added that takes a source space and a set of vertices (or set of labels)

Will we provide defaults? I'd like for EEG people to run this stuff without bothering too much with the source code.

(e.g., N100-like, P300-like, etc.)

How about eye blinks?

larsoner commented 6 years ago

Will we provide defaults? I'd like for EEG people to run this stuff without bothering too much with the source code.

We'll have to see how simple we can get the API

How about eye blinks?

simulate_raw already does this

jona-sassenhagen commented 6 years ago

simulate_raw already does this

Ok, but wouldn't it make sense to unify things?

larsoner commented 6 years ago

This plan is already unified. The SourceSimulator instance just generates brain source activations, and these will be passed to simulate_raw

larsoner commented 6 years ago

... you mean you'd like to see SourceSimulator also take care of choosing when blinks and ECG should happen?

jona-sassenhagen commented 6 years ago

If it makes sense code and API wise, yes.

jona-sassenhagen commented 6 years ago

(That wasn't my original idea, but since you're asking ...)

larsoner commented 6 years ago

Removing milestone since this does not seem to be a blocker

ngayraud commented 6 years ago

@makkostya Just a reminder that we already opened this issue here. Last checkbox is what we discussed (making the Simulation an object).

agramfort commented 6 years ago

cc @sdeslauriers @mclerc @papadop

makkostya commented 6 years ago

Ok, let's keep this issue as a principal one for simulation module discussion. Every time when someone creates an issue or PR related to the simulation module, please trace it in this issue. It will allow us to notify everyone concerned without pinging them.

agramfort commented 6 years ago

ok

did you start collecting pieces of code to consolidate?

makkostya commented 6 years ago

Here is the issue I opened for validation of the inverse problem solution: #5680

agramfort commented 6 years ago

thanks. I answered.

Following up on your conversation can you centralize the code of the different folks to see what can be consolidated and integrated into MNE?

makkostya commented 5 years ago

I'll discuss in details with Christos (cc @christospapa, @cgbenar) their code and after that I'll have an idea how to start its integration to MNE.

Sam and Ivana will work on the idea of adding connectivity information to MNE simulation tool, but there is naturally any code yet.

Nathalie already has a contribution pipeline to follow. I keep working on inverse problem validation.

makkostya commented 5 years ago

cc @ikojcic

makkostya commented 5 years ago

Here is the toy script that I think we could have for raw/epochs/evoked simulation. Change the format to ".ipynb" and open with jupyter. Please look at it and give your feedback here. The main points to discuss are:

changes in "simulate_raw" function. Allow fwd object as a default input, but keep possibility to compute fwd inside the function. Additive noise is generated outside the function.
- new functions "simulate_roi", "simulate_stc_data", "simulate_noise". I used sketches of these functions, but we need to discuss how exactly they should work. simulation_sketch.txt

agramfort commented 5 years ago

cool you resurrect this discussion

can you put the .ipynb online to we see it rendered (not just as a txt file)

you can drop it on gist.github.com for example

thanks a lot

makkostya commented 5 years ago

Here is the link to the rendered file: https://gist.github.com/makkostya/f2a15cf0203ca03e0dab6aa91621a821

sdeslauriers commented 5 years ago

I like the overall idea. Here are some specific thoughts:

The simulate_stc_data function will be closely tied to your brain model. i.e. spatial and temporal waveforms, structural connectivity, ARMA models, etc... Can we think of a few building block functions that can be combined to produce varied outputs instead of a monolithic function?
A similar observation can be made for the noise generation.
If the addition of stc with the noise object is supported via __add__, add_noise_to_raw becomes redundant. This is a more philosophical question on syntactic sugar.
If the addition of two stc is supported, then complex data can be generated by summing many simple stc. This simplifies the inputs to simulate_stc because it only need to handle one case of [whatever simulation model].

papadop commented 5 years ago

Here are other comments (and comments to comments):

@makkostya : the waveform stuff is at best unclear if this is supposed to be an example code. For clarity it should start at tmin and have a tmax-tmin duration. I understand that you implicitly assume a null time course between tmin and 0, but the current coding does not make it that clear. I believe the basic waveform should already be attached with the signal characteristics (tmin,tmax, sample_rate). This would also avoid passing tstep to simulate_stc (as well as a tmin which is different from tmin...). For me stc_data should already contain the information (tmin,tmax,tstep/samplerate).
@sdeslauriers : if I'm understanding well, it looks to me that simulate_stc_data is just for simulating time courses (actually there is nothing particular to sources there). It could actually be named simulate_tc_data. As said before, I do not like much having to pass tstep directly to that function, Having time courses which vary with space is probably not possible (or complicated) with the proposed implementation.
Not sure to know what is fwd['info'], but I guess this is part of the current implementation and important for the case where fwd is None (by the way, I say it has a default -- which I assumed is None --, but the default for fwd is not in the code you give). Could we avoid that parameter when fwd is provided ??
Finally, to answer @sdeslauriers comment about adding signals... This would be a nice syntax definitely. I'm a little worried on how SNR is handled when using the + operation or when adding multiple noises... We can attach the SNR value to the signal or to the noise signal, but what to do when adding multiple noises ??

To summarize, I think it would be nice (I suspect it already exists) to have a SignalPropertiy (or similar) object containg (tmin,tmax,tstep/samplerate) and have it attached to every signal... This would avoid passing tmin, tmax, tstep to several functions and would prevent the user to add two signals sampled at different rates. This would also naturally be one of the parameters to simulate_noise... I think this would add coherence to the framework...

Having said that, this comment has been written without a close view of what is currently existing in mne, and how much backward compatibility with the existing stuff do we want... This is just my (current) view of an ideal "world". ;-)

larsoner commented 5 years ago

this comment has been written without a close view of what is currently existing in mne, and how much backward compatibility with the existing stuff do we want

For context, there is an API plan comment above that gives the current plan (from what I understand).

I think it would be nice (I suspect it already exists) to have a SignalPropertiy (or similar) object containg (tmin,tmax,tstep/samplerate) and have it attached to every signal

Unfortunately this is quite a departure from how things work everywhere else in MNE, so probably won't fly. If we were designing things from scratch we might have added such things.

Allow fwd object as a default input, but keep possibility to compute fwd inside the function

This was already done in #5178, I think.

Additive noise is generated outside the function.

There is already a way to have additive noise added to the simulated data via the noise_cov property. This covers 90% of use cases, so other ways of adding noise I think should be left to the user to manually add to the raw instance that they obtain.

new functions "simulate_roi", "simulate_stc_data", "simulate_noise".

Take a look at the proposal comment above -- I think the only modifications to simulate_raw necessary to support these functions is allowing stc to be a generator. A separable issue is then what source estimates you provide as inputs (in generator form). As @sdeslauriers points out:

The simulate_stc_data function will be closely tied to your brain model. i.e. spatial and temporal waveforms, structural connectivity, ARMA models, etc...

We'll have to decide what to actually include in MNE -- some ideas are in the comment I linked to above. We shouldn't include everything, but:

just target what 90% of users want to do (I would say: simulate some ERP, or some basic bivariate connectivity).
provide an interface to allow advanced users to get the last 10%. In this case, allowing simulate_raw to take a stc generator should do it, I think.

agramfort commented 5 years ago

@makkostya https://github.com/makkostya or @sdeslauriers https://github.com/sdeslauriers can write a usage snippet example to demo the API you have in mind?

To converge on API i prefer to discuss on code snippets as they make clear what we have in mind.

@makkostya https://github.com/makkostya do you have the code snippet we drafted together at biomag?

Basically we need to have as input:

an info
a function to produce brain time series using labels or not and with event or not
a way to compute forward
a way to add noise

I like the idea that we simulate raw and then we have standard mne functions to have Epochs and Evoked

larsoner commented 5 years ago

Basically we need to have as input:

an info

a function to produce brain time series using labels or not and with event or not

a way to compute forward

a way to add noise

We already have all of these in simulate_raw except the function to produce brain time series. This is exactly what the proposal to allow stc to be a generator would provide. Then the issue of how to produce source time series of interest. So we almost already have the necessary API.

agramfort commented 5 years ago

then let's make a concrete proposition to adjust the API. How would the diff look like?

larsoner commented 5 years ago

def simulate_raw(raw, stc, ...)
...
    stc : instance of SourceEstimate

would become

...
    stc : instance of SourceEstimate | generator

i.e., you can pass in a generator, and simulate_raw will iterate over those to produce the outputs, rather than looping through a single SourceEstimate (as it does now, which is useful only really for static ERP-style responses).

larsoner commented 5 years ago

(this is what is meant in the comment above)

agramfort commented 5 years ago

This would cope with events and event_id too?

larsoner commented 5 years ago

simulate_raw (this is in the API doc) outputs events in the stim channel, the same for each event. You can recode these arbitrarily afterward, knowing your STC input. Or overwrite the stim channel yourself.

We could add an event_id_func or something if we don't just want the event IDs to work the way they do now.

larsoner commented 5 years ago

To clarify, it's only the same (1) if there is no head_pos, so the default behavior is to set the event number to the head position (number) that was used.

Events coded with the position number (starting at 1) will be stored in the trigger channel (if available) at times corresponding to t=0 in the stc.

Again this can be changed easily afterward (esp. after find_events), but if this seems too burdensome we can add an event_id_func to the API with some suitable args, and make it backward compatible.

Maybe the thing to do is have people read about simulate_raw, try using it, and see if the API we proposed some months ago still makes sense?

agramfort commented 5 years ago

@makkostya and @sdeslauriers what is it you cannot do currently with simulate_raw or with what @larsoner proposes using a generator?

sdeslauriers commented 5 years ago

@ikojcic, @makkostya and I discussed this. It is not clear for us how the generator (or just an iterable for that matter) would improve the situation. We are guessing every iteration would return a source time course? If that is the case, just create an stc with from the output and call simulate_raw normally.

In our minds, the simulation process has 3 steps: 1 - Generation and summation of different source space signals with events and the usual meta data. 2 - Projection of the source space data into sensor space data (with head movement if desired). 3 - Generation and addition of noise to the measurement.

This has a direct link to the formulation of the forward problem which we think will be familiar to most users.

The current version of simulate_raw covers part of 2 and 3 which makes the function more complicated than it needs to be. In addition, from what we got from reading the code, it is also very strict in how it handles the timing of events (the stc is repeated until the time runs out).

Here is an updated snippet of @makkostya that separates the steps above. https://gist.github.com/sdeslauriers/9f856925797108a8ee57d33861a8d39e

larsoner commented 5 years ago

We are guessing every iteration would return a source time course? If that is the case, just create an stc with from the output and call simulate_raw normally.

Yes in principle you could do this, but if your recording is long it's not going to be feasible to do so due to memory constraints if you have a lot of sources.

The other way it's conceptually different is that events in the STIM channel are tied to the STC onsets.

it is also very strict in how it handles the timing of events (the stc is repeated until the time runs out).

I don't see it as being too difficult to overwrite the STIM channel, or add metadata afterward.

makes the function more complicated than it needs to be.

Is this a problem for you in using it? I'd rather not add new ways of doing the same things without a good justification. The internals being complicated -- something a user shouldn't have to deal with -- is not a good reason to write an entirely new user-facing API.

it is also very strict in how it handles the timing of events (the stc is repeated until the time runs out).

This is what would be relaxed by allowing stc to be a generator, so I don't see this as a limitation.

I'm sure it's possible to separate the steps, but I'm not convinced we should completely rewrite the API at this point.

What do you actually need to do in practice that you can't do with the current API?

larsoner commented 5 years ago

... in case it's not clear, basically I agree with you that there is a better potential split between these steps. If we were designing something from scratch it would make sense to do it. But we have an existing function to take care of these things, and it's better not to rewrite it -- or worse, add a second way of accomplishing the same thing -- unless really necessary. So I'd like to see if it's possible to improve what we have to get what people need, rather than start over from scratch.

sdeslauriers commented 5 years ago

Yes in principle you could do this, but if your recording is long it's not going to be feasible to do so due to memory constraints if you have a lot of sources.

The other way it's conceptually different is that events in the STIM channel are tied to the STC onsets.

Then I would suggest to use the more general case. If they fit in memory, any iterable will work. If it doesn't use a generator, which is iterable.

I don't see it as being too difficult to overwrite the STIM channel, or add metadata afterwards.

That is because you are familiar with the internals. I think to most users it would be non trivial and error prone.

Is this a problem for you in using it? I'd rather not add new ways of doing the same things without a good justification. The internals being complicated -- something a user shouldn't have to deal with -- is not a good reason to write an entirely new user-facing API.

I disagree. You currently have many people that are willing to work on this and improve the API, it may be a good time to consider breaking changes.

What do you actually need to do in practice that you can't do with the current API?

I do not think that is a fair question. Using generators you are calling arbitrary code, so of course we could simulate anything. I think a better question is whether simple simulations are simple to implement or require the use of generators. I believe our suggestions simplify many common use cases and will make easier for new users.

For example, it is not trivial to simulate events that are not distributed uniformly or using multiple waveforms. It would require using generators and hacking the stim channel.

If we could implement our suggestion while preserving the current interface i.e. make the new interface backwards compatible, would that be better?

larsoner commented 5 years ago

Then I would suggest to use the more general case. If they fit in memory, any iterable will work. If it doesn't use a generator, which is iterable.

Agreed stc should be any iterable object, and not restricted specifically to a generator.

I think a better question is whether simple simulations are simple to implement or require the use of generators. I believe our suggestions simplify many common use cases and will make easier for new users.

Sure, but it's not the only question we need to consider. For example, I work with people who don't use simulate_raw precisely because it does not have generator/iterable-like support. If it did, they would. Fortunately, in this case I think we can have our cake and eat it to...

For example, it is not trivial to simulate events that are not distributed uniformly or using multiple waveforms. It would require using generators and hacking the stim channel.

...because the SourceSimulator class mentioned in the API proposal above sounds like what you are really asking for here. It would integrate with the stc-allowed-to-be-generator case just fine. The simple-use-case code would be:

stc_gen = SourceSimulator(...<simple interface here>...)
raw = mne.simulation.simulate_raw(raw, stc_gen, ...)

This seems like a straightforward way to provide a simple interface to the user, while still allowing for advanced use cases.

I don't see it as being too difficult to overwrite the STIM channel, or add metadata afterwards.

That is because you are familiar with the internals. I think to most users it would be non trivial and error prone.

Let's find a way to extend the existing API somehow to allow for events to be supplied, too. For example we could have new event_id_func kwarg, or allow stc to be a tuple of (SourceEstimate, ndarray of shape (n_times,)) where the latter is what should be put in the STIM channel. Something like this is quite general, and again is something we can build into what gets returned by the use-case-simplifying SourceSimulator class.

If we could implement our suggestion while preserving the current interface i.e. make the new interface backwards compatible, would that be better?

Yes preserving existing functionality and API would be best, it's basically what I'm pushing for here.

If we agree to keep the API backward compatible, the question is becomes whether or not we should completely rewrite the internals of simulate_raw. My answer would be sure, refactoring is fine as long as you are able to preserve the current advantages of the existing system. But that's a challenge -- some of these advantages might not be obvious from looking at the code. For example, much of the complications come from 1) being memory-conscious, 2) allowing for multiple head positions, 3) smoothly interpolating between head positions, and 4) allowing multiple types of noise / needing many forward operators. If you can keep these things and make the internals simpler, that's a welcome change! But it sounds like a lot of work for little gain.

FWIW, the stc-as-generator is something that I am happy to implement in the next week or two. That way people who are motivated and eager to contribute could work on things that are more interesting (diversity of simulated signals in the SourceEstimator class!) than refactoring a bunch of code.

The question is, will the following MNE get you what you need as users:

stc-as-iterable (I can do this)
event-setting in the API (we'd need to flesh this out, but I'm happy to implement it)
SourceSimulator class that is iterable (you would do this fun part)

agramfort commented 5 years ago

would something like this fly?

def stc_generator(event_id, src=src, labels=labels):
    if event_id == 1:
        # this may need the knowledge of sfreq from raw
        ...
        return stc_data_evoked_response1
    elif event_id == 2:
        ...
        return stc_data_evoked_response2
    else:
        raise ValueError('boom!')

events1 = mne.make_fixed_length_events(...)  # to fix as it needs raw :(
events2 = mne.make_fixed_length_events(...)  # to fix as it needs raw :(
events = np.r_[events1, events2]

# Generate raw with evoked activity for each condition (1 or 2)
raw_sim = simulate_raw(raw, stc_generator, trans_fname, src, bem_fname,
                       events, ecg=True, blink=True, n_jobs=1, verbose=True)

# then add sensor noise
raw_sim = mne.simulation.add_noise(raw_sim, iir_filter=[0.2, -0.2, 0.04], cov='simple')

we pass a callable as stc parameter that can produce stcs on the fly and specifically for each event_id value.

I also see a point in allowing to add noise in a second stage to have more flexibility.

larsoner commented 5 years ago

It would work. However, the API is more complicated than having SourceSimulator, or something else, keep track of what source space signals should be created. In your code, you pass in events to simulate_raw, which in turn controls what stc_generator does via some inter-function API. Answering the question "what sources will be simulated" one has to think about both stc_generator and how it interacts with simulate_raw, since now it's a two-way interaction.

It seems better to keep control of source signals outside of simulate_raw, keeping stc as an "input only" to simulate_raw. Your code could be easily refactored to do this given the current API proposal:

def my_stc_generator()
    for event in events:
        if event_id == 1:
            ...
            yield SourceEstimate(...), stim_channel_array
        elif event_id == 2:
    ...
raw_sim = simulate_raw(raw, my_stc_generator(), ...)

This also allows for control over where the event is inserted in the stim_channel_array, etc.

And if we have a good SourceSimulator class, it could be as simple as something like the following (subject to many API decisions about SourceSimulator, but no additional ones about simulate_raw):

source_sim = SourceSimulator(...)
stc_gen = source_sim.stcs_from_events(events)
raw_sim = simulate_raw(raw, stc_gen, ...)

I also see a point in allowing to add noise in a second stage to have more flexibility.

I guess you could want different SNR values and it's backward compatible, so why not.

agramfort commented 5 years ago

ok we're making progress I think. Now how does the generator know about the duration/length of the stim_channel array it needs to return?

rather than having raw as input, would having info and duration in seconds be better?

what would be parameters of SourceSimulator?

larsoner commented 5 years ago

Now how does the generator know about the duration/length of the stim_channel array it needs to return?

It's returning a SourceEstimate on each iteration, so the array is of shape (len(stc.times),), and for simplicity we could also permit a single int which would be equivalent to:

stim_channel_array = np.zeros(len(stc.times), int)
stim_channel_array[0] = int_value

rather than having raw as input, would having info and duration in seconds be better?

We can, but it adds one more parameter to the function. The advantage I guess is that it saves the user from having to do this:

data = np.zeros((len(info['ch_names']), int(round(raw.info['sfreq'] * duration)))
raw = mne.io.RawArray(data, info)

That does not seem like a lot to ask of the user, but I'm okay with it if people think it's worthwhile because we can just add these lines in a conditional in simulate_raw. (If the user supplies raw as an instance of Raw, then duration should be None; if raw is an instance of Info then duration should not be None.

what would be parameters of SourceSimulator?

That is the "make things simple" part that I think other people actually want to work on (right?). I have no idea what this API should look like so I'm happy to let others work it out :)

agramfort commented 5 years ago

I think where is my confusion. For me the generator generates a signal long as an epoch and not the full duration of raw. If you do the former than the generator does not need to know about events but just event_id and if the latter it needs to know events and the full duration of the data.

does it help?

larsoner commented 5 years ago

For me the generator generates a signal long as an epoch and not the full duration of raw.

Yes this is probably how most people will use it, and what SourceSimulator will do

If you do the former than the generator does not need to know about events but just event_id

Something has to tell simulate_raw what to put in the stim channel to match what it's putting in as sources/activations. I think it should be the job of the SourceSimulator to provide this information (or live with simulate_raw putting an event at the onset of the "epoch", as it does now / is the default behavior).

makkostya commented 5 years ago

For me the generator generates a signal long as an epoch and not the full duration of raw

The question is what "epoch" means for you? Maybe I misunderstand it, but for me it's related to events, i.e. based on the event_id the raw data can be decomposed to epochs in many different ways. Thus I think that generator should provide a source space signal of full duration of raw. In my mind the simulate_raw should only apply forward operator in a proper way, create stim_channel from input events, but not change the signal.

larsoner commented 5 years ago

The question is what "epoch" means for you?

The great thing about the existing API proposal is that it allows the user to decide ...

Thus I think that generator should provide a source space signal of full duration of raw. In my mind the simulate_raw should only apply forward operator in a proper way, create stim_channel from input events, but not change the signal.

Either the (short-)"epoch"-based or full-duration interpretation can be easily implemented, as preferred by the user or designer of SourceSimulator, if the API allows stc to be an interable/generator of either stc, or (stc, event_array) tuple.

In my mind the simulate_raw should only apply forward operator in a proper way, create stim_channel from input events, but not change the signal.

Agreed, there is no proposal here to change the (source) signal.

We do have to decide at some point what SourceSimulator should output. To me it makes sense for it to output one stc per event as a generator. Conceptually it's the same as outputting a single STC (i.e., equivalent to some concatenation of [stc in stc_gen]) but it has advantages under the hood memory-wise. And implementation-wise it's as simple as the def my_stc_generator() above.

But in the meantime can we agree that stc in simulate_raw being allowed to be an iterable, and either stc, (stc, int), or (stc, ndarary) tuples will allow us to do everything we want? If so then I can get working on the stc-as-iterable implementation, and the discussion can shift to what SourceSimulator should do.

larsoner commented 5 years ago

i.e. can we move forward with the plan in https://github.com/mne-tools/mne-python/issues/5058#issuecomment-377500991 ?

agramfort commented 5 years ago

I think that generator should provide a source space signal of full duration of raw.

I would be enclined to say the opposite. A full stc for long recordings can be a huge array and for me you simulate a brain response based on a condition ie an event_id. "Tell me what is the event_id and I will tell you how the brain will respond" is for me what a SourceSimulator would do.

makkostya commented 5 years ago

But in the meantime can we agree that stc in simulate_raw being allowed to be an iterable, and either stc, (stc, int), or (stc, ndarary) tuples will allow us to do everything we want? If so then I can get working on the stc-as-iterable implementation, and the discussion can shift to what SourceSimulator should do.

I agree that stc in simulate_raw can be allowed to be an iterable.

So for each event_id:

SourceSimulator provides (stc[event_id], event_array[event_id]), where stc represents source space signal of one waveform, and event_array tells where in raw data this waveform should appear
- simulate_raw applies forward to stc[event_id].data, then puts it into the raw data at places defined by event_array.

The whole raw signal is thus the sum of raws of each stc.

If it is supposed to work like this, for the moment I see only one problem - it enforces the signal of interest to be exactly the same over the epochs. I suppose that in practice, we would also like to have some variability of SOI's amplitude or shift, but I don't really see how it can be done with this API.

larsoner commented 5 years ago

SourceSimulator provides (stc[event_id], event_array[event_id]),

SourceSimulator outputs what simulate_raw wants, which is one of:

stc alone: simulate_raw will insert a 1 in the stim channel at the start of each segment
(stc, my_int): simulate_raw will insert the my_int in the stim channel at the start of each segment
(stc, stim_array): simulate_raw will insert the stim_array in the stim channel during same time span that the fwd @ stc is inserted in the M/EEG channels.

I'm calling it stim_array to make it clear that we're talking about deltas in a stim channel, rather than event_id or events arrays (though you'd typically eventually do events = find_events(raw), which looks for deltas in the stim_array). Note that (1) and (2) are just convenience aliases for (3) where:

event_array = np.zeros(len(stc.times), int)
event_array[0] = my_int

This third option allows SourceSimulator to set the event time relative to the waveform times however it wants to achieve the desired result.

simulate_raw applies forward to stc[event_id].data, then puts it into the raw data at places defined by stim_array.

Not quite. The result of forward application will happen exactly at the sample numbers 0:len(stc.times) on the first iteration; and the stim_array will similarly show up in the stim channel at the same channel numbers. You'll end up with raw.get_data() that looks like:

------------------------------------------------
MEG  | fwd @ stc_0.data | fwd @ stc_1.data | ...
------------------------------------------------
STIM | stim_array_0     | stim_array_1     | ...
------------------------------------------------
       time ->

i.e., you supply the stim channel data that matches the source data however you want such that eventually doing find_events(raw) will give you the events you want in your analysis.

For example let's say we designed the interface to look like this (and this is not a good design!) just for the sake of discussing the SourceGenerator<->simulate_raw interaction:

stc_gen = SourceSimulator(region='right auditory', response='n100', jitter=0.01)

i.e., we want a right auditory N100 with 10 ms jitter (relative to our event in the stim channel) for each epoch.

The stc_gen could:

iteratively supply stc (or (stc, 1)) where on each iteration the waveform in stc has been jittered/shifted in time, or
supply (stc, stim_array), where the stc is the identical on every iteration, but the delta that appears in the stim_array was jittered in time. Either way, the resulting raw = simulate_raw(...) instance would have auditory N100s jittered by 10 ms.

Either way, the result when you go to analyze is source waveforms that are jittered relative to the events you get with find_events, which is what we want.

In this scheme you can arbitrarily change the relationships between event timing, event number, source waveforms, which sources are active, etc. And all of this complexity can be determined/encapsulated at the level of SourceSimulator.

mne-tools / mne-python

Feature: Simulation of signals #5058