mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.74k stars 1.33k forks source link

Are projections applied when fitting ICA? #4302

Closed cbrnr closed 7 years ago

cbrnr commented 7 years ago

I'm computing ICA on continous raw EEG data. In my pipeline, I set

eeg.set_eeg_reference()

before doing

ica = ICA(method="extended-infomax", random_state=1)
ica.fit(raw)

Does this imply that I fit ICA on average referenced raw data? Or do I need to explicitly invoke raw.apply_proj() before fitting ICA?

Similarly, if I only have the three lines described above, when I call raw.get_data(), is the average reference projection applied automatically?

From what I've seen in the code, it seems that the average reference projection is NOT applied automatically except for in raw.plot (where it can be switched on and off).

agramfort commented 7 years ago

I'm computing ICA on continous raw EEG data. In my pipeline, I set

eeg.set_eeg_reference()

before doing

ica = ICA(method="extended-infomax", random_state=1) ica.fit(raw)

Does this imply that I fit ICA on average referenced raw data? Or do I need to explicitly invoke raw.apply_proj() before fitting ICA?

you need to call apply_proj AFAIK

Similarly, if I only have the three lines described above, when I call raw.get_data(), is the average reference projection applied automatically?

look at raw.info['projs'] it will tell you if proj is active (ie applied) or not

From what I've seen in the code, it seems that the average reference projection is NOT applied automatically except for in raw.plot (where it can be switched on and off).

yes

cbrnr commented 7 years ago

Thanks, that's good to know. There's a note in the documentation of set_eeg_reference that average reference is actually the default if nothing else is specified, and that apply_proj needs to be called on preloaded data. However, this makes me wonder which functions actually use the average reference projector with preloaded data. None (i.e. do I always have to apply the projector)? Would ICA use the average reference projection if I didn't preload the data?

agramfort commented 7 years ago

I have no strong feeling here. Please make suggestions of improvements (code and/or documentation)

cbrnr commented 7 years ago

Ideally, the docs should state when the average reference projector is automatically applied. Since I don't know if this is the case I can't suggest specific changes. If it is the case that the projector is never applied for preloaded data, the last point in the notes section could be reformulated as "Be aware that on preloaded data, SSP projectors are never automatically applied. Use the apply_proj() method to apply them." (replace "not" with "never").

cbrnr commented 7 years ago

Hm, just found https://github.com/mne-tools/mne-python/issues/3998 and https://github.com/mne-tools/mne-python/issues/4004. It might be just me, but re-referencing to average reference is still not very intuitive. @hoechenberger do you think your concerns have been addressed?

hoechenberger commented 7 years ago

Hi @cbrnr, while #4004 has certainly improved things, personally I still would not call the current behavior intuitive. Average-referencing behaves different than other referencing schemes. I believe that is the main issue, and all of my colleagues who have tried out MNE had trouble with this. We all come from an EEG-only background (ok, some of us have done fMRI too; but there's no MEG people here), and without exception we all found this bit confusing. #4004 has greatly clarified the docs, yet the term "SSP projector" alone is unknown to all EEG folks I've asked. I don't know how to solve this issue really. It's certainly a problem for many users though.

cc @lorrandal

cbrnr commented 7 years ago

Thanks @hoechenberger, same here! The single most confusing behavior for me is that raw.set_eeg_reference() does not actually re-reference the data to average reference. What's more, this might only be true if the data is preloaded. I actually wanted to compute ICA on average referenced data, and later noticed that the data was never re-referenced.

Since there still seems to be room for improvement, how could we make re-referencing more intuitive especially for non-MEG folks? The simplest solution seems to be making average reference a true re-referencing operation instead of a projector. What are the arguments in favor of keeping it a projector?

hoechenberger commented 7 years ago

x-ref to related mailing list posting: https://mail.nmr.mgh.harvard.edu/pipermail//mne_analysis/2017-June/004094.html

cbrnr commented 7 years ago

It's weird that bad channels were causing issues with ICA, because ICA.fit ignores/excludes all bad channels, independent of the reference (the default picks=None used only good data channels). I'll respond to this message on the list once I'm subscribed 😄 .

larsoner commented 7 years ago

without exception we all found this bit confusing.

Okay so we clearly still have a problem that we need to fix.

One thing we could consider is adding set_eeg_reference(..., apply=True):

  1. Deprecation cycle in 0.15 where default is apply=None. This means True for non-avg and False for avg, with a warning either way. MEG people will probably want to do apply=False and EEG people will probably want to do apply=True for a cycle to avoid the warning.
  2. In 0.16 the default is apply=True. I think everything will still work okay, even for MEG people if they don't remember to do apply=False.

@agramfort WDYT?

cbrnr commented 7 years ago

If you apply a reference, including average reference, is it possible to undo this (i.e. re-reference to a different ref)?

larsoner commented 7 years ago

Yes you should be able to re-reference. I think our API allows it even if you've applied an EEG reference as a projector, but if it doesn't, we can probably make it support it.

Also cc @wmvanvliet about the above proposal for apply=True

hoechenberger commented 7 years ago

If you apply a reference, including average reference, is it possible to undo this (i.e. re-reference to a different ref)?

Yes, this certainly works. Or at least it used to -- haven't tested with the latest master :)

hoechenberger commented 7 years ago

@Eric89GXL I like your proposal with apply=True, seems much more sensible to me!

wmvanvliet commented 7 years ago

What would set_eeg_ref(raw, ['EEG1', 'EEG2'], apply=False) actually do?

mmagnuski commented 7 years ago

Yes you should be able to re-reference. I think our API allows it even if you've applied an EEG reference as a projector, but if it doesn't, we can probably make it support it.

@Eric89GXL - if I'm not mistaken I had a situation recently when I first re-referenced to avg and applied proj, then marked bad channels and interpolated them and wanted to re-ref to avg again and mne told me I cannot because the avg ref is already there... But I could have preformed epoching in the meantime which has proj=True by default (which BTW adds to the confusion). I can check it later today.

larsoner commented 7 years ago

What would set_eeg_ref(raw, ['EEG1', 'EEG2'], apply=False) actually do?

This would be an error. apply=False would only be allowed in avg mode.

I could have preformed epoching in the meantime which has proj=True by default (which BTW adds to the confusion).

Yes this is related to #2310, which I started addressing a long time ago but it's a pain

cbrnr commented 7 years ago

But this still doesn't get rid of treating references based on a list of channels and the average reference differently. After all, we could express the average reference with a list of all channels. I'd vote for treating all references the same under the hood (but shorthand notations can of course remain, such as None for average reference - although this is another rather unintuitive choice and should probably be changed to []).

larsoner commented 7 years ago

But this still doesn't get rid of treating references based on a list of channels and the average reference differently.

To the user, or under the hood? Hopefully to the user it's pretty transparent / they won't notice. Under the hood it needs to be different, because the inverse modeling code handles (and requires actually) an average reference projection for EEG data.

In other words, I agree it's not as simple as what you propose, but what you propose unfortunately breaks our inverse code.

cbrnr commented 7 years ago

OK, if existing code relies on the method/function set_eeg_reference to behave exactly like this, we could introduce a new argument value "average" to mean an average reference which gets applied by default. Just like all other references consisting of a list of channels.

If the behavior can be changed, would a separate function that computes/sets an average reference projection be an option?

larsoner commented 7 years ago

If the behavior can be changed, would a separate function that computes/sets an average reference projection be an option?

I'd rather just add a kwarg to set_eeg_reference. I guess this would mean an alternative proposal like set_eeg_reference(..., projection=False) where inverse imaging folks would want to use projection=True (which would also imply it wouldn't be applied immediately), and projection=True would only be allowed with None or whatever we change it to if we do change it (e.g., 'average'). I'm okay with that, too, I suppose.

cbrnr commented 7 years ago

A projection kwarg makes more sense to me - good idea! This way, all references would behave the same to the user (i.e. they get applied by default). Under the hood, 'average' (or whatever it will be called) in combination with project=False could be implemented by creating a reference with a list of all channels (excluding bads). When project=True, the current implementation would be used. Let's just make sure that all re-referencing operations are reversible (i.e. I want to be able to switch to different references in my pipeline based on the same raw object), but as many of you have already stated this is already possible.

larsoner commented 7 years ago

Let's just make sure that all re-referencing operations are reversible

Yes agreed. Although technically this is (hopefully?) an orthogonal issue to the proposed API changes anyway, so I'm happy to punt on that for now :)

So the proposal for set_eeg_reference seems to have converged to:

  1. Add projection=False default (via deprecation+warning, 0.15 None->True). By 0.16 this makes things easier for EEG people, but harder for source localization / inverse imaging people. It unifies how set_eeg_reference works. projection=True only works in avg mode.

  2. Change ref_channels=None to ref_channels='average' to be explicit.

@wmvanvliet @hoechenberger @mmagnuski @agramfort WDYT?

hoechenberger commented 7 years ago

@Eric89GXL This actually sounds very good. The API would become more explicit and more intuitive at the same time. I really like that ;)

@lorrandal What do you think?

agramfort commented 7 years ago

can you write down explicitly the new API code for

thx

larsoner commented 7 years ago

set avg ref but don't apply a proj (just add it to info['projs'])

set_eeg_reference(inst, projection=True) (docs would specify projection=True is special because it does not immediately apply it, and uses projection math during projection application stages instead)

set avg ref and apply it (is it always done with a proj?)

set custom reference

set_eeg_reference(inst, ref_list)

undo any of the referencing above

This is unchanged, same as in master. You can call set_eeg_reference with whatever arguments you want. We don't currently provide a way to un-reference EEG-referenced data, even though in the case of an unapplied projection vector it is possible (well get it back to the nose ref at least), using this function. But I guess you can do inst.del_proj(...) with the EEG projection vector selected.

agramfort commented 7 years ago

ok

what does set_eeg_reference(inst) does? can I still use inst for inverse modeling?

mmagnuski commented 7 years ago

@Eric89GXL I like what you're proposing, but wanted to ask a related question related to avg proj and inverse modeling (@agramfort too): I understand that it's better to have avg ref for inverse modelling but why does mne require avg ref to be a projection? And a related question: if I have some data with average ref already applied would mne's inverse accept if I added a fake avg ref proj (fake because the average would be already zero) or would I have to re-ref to some other channel and add avg ref proj?

larsoner commented 7 years ago

what does set_eeg_reference(inst) does? can I still use inst for inverse modeling?

Nope, you need set_eeg_reference(inst, projection=True).

This is the unfortunate tradeoff currently for making things consistent with all other set_eeg_reference use cases and making the package more EEG- (and ECoG- and ...) friendly. FWIW to the extent that it makes it less M/EEG / NeuroMag-friendly, it is also making us M/EEG people be more explicit, which I don't mind. The frequency with which this issue of projections comes up is more troubling to me.

why does mne require avg ref to be a projection

From the practical side, all of the inverse code accounts for things using projections (and compensation matrices). Actually come to think of it we could hack together EEG references as a form of compensation matrices I think -- but this would be even uglier than what we have now.

From the theoretical side, I seem to remember someone saying avg ref leads to less error but I can't remember offhand. @agramfort?

If it is possible to make the inverse modeling code handle non-average projections, this would also take care of @agramfort's point about set_eeg_reference(inst) not working, as we could make the code handle this case properly... but it would be a lot of work in practice (we would need to make very sure we did it correctly).

if I have some data with average ref already applied would mne's inverse accept if I added a fake avg ref proj (fake because the average would be already zero)

Well this wouldn't really be "fake" per se -- it would be a real projection operator you'd be adding. Projection operations can be repeated over and over with no effect (beyond small amplification of numerical noise). So although it wouldn't really change the data, it would provide proper accounting for what actually happened to the data during inversion.

cbrnr commented 7 years ago

Coming back to the question of undoing referencing, EEG data are always measured with respect to some reference. However, if we load a data set this information might not be present in the file, so we should have a way to specify the reference manually. For example, BioSemi records all channels with respect to the CMS electrode (and it uses an additional DRL electrode to improve the CMRR). Other systems might record channels with a different (more conventional) reference such as Fz. Therefore, we should be able to let MNE know which reference is associated with the data. By default, this value could be "unknown" or something. I think currently set_eeg_reference([]) means "don't change anything", which is what we want, but we might want to add some value to specify what this current reference is.

hoechenberger commented 7 years ago

@cbrnr I just took a look at our latest experimental analysis code -- "experimental" meaning I didn't have time to thoroughly check if everything is actually working as intended. But basically, our situation is the following:

We use an ActiChamp system, usually with Oz set as reference during recording. When we import the data into MNE, the Oz electrode is missing. We would filter and epoch the data, run ICA, and only then try to recover the reference channel and reference to average:

# Recover Oz electrode, which was used as a reference during recording.
mne.add_reference_channels(epochs, 'Oz', copy=False)
epochs.set_montage(montage=montage)
epochs.info['custom_ref_applied'] = False

# Switch to average reference.
epochs.set_eeg_reference(ref_channels=None)
epochs.apply_proj()
cbrnr commented 7 years ago

Thanks @hoechenberger - adding a reference channel means adding a zero channel called Oz in your case, right? I'm not sure if you want to add a zero channel before calculating the average reference though...

hoechenberger commented 7 years ago

Thanks @hoechenberger - adding a reference channel means adding a zero channel called Oz in your case, right?

Yes.

I'm not sure if you want to add a zero channel before calculating the average reference though...

The reference is always "zero" by definition; dropping it as a reference (i.e., adding it back as an ordinary channel -- epochs.info['custom_ref_applied'] = False) and then applying the avg ref essentially "recovers" that channel's information, if you will. At least that's my understanding :) @lorrandal, any thoughts / comments on this?

cbrnr commented 7 years ago

Hm, is this procedure compatible with what is described here?

hoechenberger commented 7 years ago

Hm, is this procedure compatible with what is described here?

I think it's essentially identical to what is described in the section starting with Re-referencing data can be more complicated.

cbrnr commented 7 years ago

😄 yes, this is really true! So essentially you have to decide if you want to put the current reference electrode back in the data or not - in your case, you want Oz to be part of the data, and in my case (Biosemi), I don't want the current reference(s) CMS (and DRL) in the data, so I'd not add any reference channels. Correct?

What does this mean for the proposed API change? Are we still good with introducing a projection argument in set_eeg_reference?

hoechenberger commented 7 years ago

yes, this is really true!

... but also really hard to understand at first, because the EEGLAB docs are ... errm... kind of confusing in this regard. Will respond to your other points later today.

agramfort commented 7 years ago

from a user perspective, if I now want to do inverse modeling and I did set_eeg_reference(inst, projection=False) I will get an error telling me to set projection=True? or can we automatically add the eeg avg ref in inverse code if custom_ref is False? that would make it easier.

and yes avg ref is the standard way for inverse modeling. The reason it's done with projection is that you need to apply the proj to forward solution and noise cov in the inverse code.

cbrnr commented 7 years ago

Agreed. Ideally this should be handled transparently in the inverse code if it always needs to project to average ref anyway. No need to "pollute" the user-facing method to set a reference IMO (that's also how it is done in EEGLAB with DIPFIT BTW).

Where is the information on the currently used (applied) reference (including average reference) stored?

agramfort commented 7 years ago

avg proj is on if you have in info['projs'] a Projection instance called "EEG average reference" marked as "active"

cbrnr commented 7 years ago

And other refs? Are these stored in raw.info['custom_ref_applied']? Should there be one field containing the currently set reference instead? I think it would be nice if this was not spread across at least two different fields.

wmvanvliet commented 7 years ago

We don't have to break any existing code. When doing set_eeg_reference(inst) or set_eeg_reference(inst, projection=False), we need to do the right thing and keep info.custom_ref_applied=False.

The "custom ref" represents something other then an average ref. When doing inverse computations, _needs_eeg_average_ref_proj first checks the custom_ref_applied flag and then proceeds to check for the presence of an average ref proj. Since one is not present, an average ref projection will be added, which is exactly what we want.

lorrandal commented 7 years ago

@cbrnr @hoechenberger

in your case, you want Oz to be part of the data

This was exactly our situation, since we wanted to compare this dataset with another that was acquired keeping all 64 electrodes -- Oz included.

identical to what is described in the section starting with Re-referencing data can be more complicated.

Should be this, in my understanding information regarding Oz is not lost but simply spreaded across all other electrodes. So adding Oz with zero values and re-referencing would make us able to ricover that data.

mmagnuski commented 7 years ago

@Eric89GXL @agramfort Thanks for clarifying!

agramfort commented 7 years ago

seems like a plan? @cbrnr do you give it a try?

cbrnr commented 7 years ago

I will need to wrap my head around the existing API first, and I can't do it immediately because I'm a bit swamped with other stuff, but yes, I will try to push a PR to address this (if there are not other volunteers who want to do it right now :-)).

larsoner commented 7 years ago

Coming back to the question of undoing referencing, EEG data are always measured with respect to some reference. However, if we load a data set this information might not be present in the file, so we should have a way to specify the reference manually

FYI this currently lives in info['chs'][n]['loc'][3:6]. For Neuromag EEG it's on the nose.

We don't have to break any existing code.

It's good that makes the projection=False change will make things easier for MEG people. The disadvantage is that we've lost any record-keeping that the data are no longer referenced to the nose. Perhaps we need to do something with info['chs'][n]['loc'][3:6] to fix this. Even setting it to zero might be okay. I'd need to check whether it messes with the forward code -- I don't think it should, because the ref is subtracted during forward computation, but then effectively added back in (cancelling this out) during avg ref.

yes, I will try to push a PR to address this

That would be great. I might have time to try at some point, too. We'll see who has time first :)

larsoner commented 7 years ago

Related: #4352.

cbrnr commented 7 years ago

Quick reminder, is the currently applied reference stored somewhere? raw.info["custom_ref_applied"] contains only a boolean value but not the actual channels. For example, if I set raw.set_eeg_reference(["Cz"]), where is this information stored?

wmvanvliet commented 7 years ago

For example, if I set raw.set_eeg_reference(["Cz"]), where is this information stored?

Nowhere.