mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.7k stars 1.31k forks source link

Channel type auto scalings in plot #2253

Closed choldgraf closed 8 years ago

choldgraf commented 9 years ago

Maybe it's just an ecog thing, but my data are often at a very different scale than the MNE defaults during calls to plot. Usually I just do something like scalings={'eeg': data._data.max() / 2} or something. However, would it be useful to just allow a string to be passed to "scalings" that would automatically do this? E.g.:

if scalings == 'auto':
  ch_types = [mne.io.pick.channel_type(data.info, i) for i in range(len(data.ch_names))]
  scalings = {}
  for type in set(ch_types):
    picks = data.pick_types({type: True})  # Would also need to set MEG to false if this isn't a meg chan
    scalings[type] = picks.max() / 2

This could either be allowed as a function to plot commands, or perhaps a more convenient way of implementing this without changing API stuff too much is to just implement a create_channel_scalings function. e.g.:

def create_channel_scalings(data, scale_func=np.max, scale_factor=1):
  ch_types = [mne.io.pick.channel_type(data.info, i) for i in range(len(data.ch_names))]
  scalings = {}
  for type in set(ch_types):
    picks = data.pick_types({type: True})  # Is there a better way to pick channel types?
    scalings[type] = scale_func(picks) / scale_factor
  return scalings

Thoughts?

agramfort commented 9 years ago

my feeling is that we should have a proper ECoG channel type rather than hacking EEG with ECoG data.

choldgraf commented 9 years ago

I'd be +1 on this for sure, but I've been holding off on suggesting it just because I can't tell if anybody else is using MNE with ECoG or if I'm a small use case. Instead I just mark all channels as 'eeg' and try to remember to set add_eeg_ref to false any time I load in data. I think implementing an 'ecog' channel type could pose some extra challenges (e.g., do you allow people to load in all of their data types, including grid, strip, and depth electrodes?). The upside is that (to my knowledge) there isn't a good ecog analysis pipeline out there in python right now (I think Fieldtrip does some of this in matlab though).

aestrivex commented 9 years ago

I have been setting the channel type for ECoG data to the existing sEEG channel type in my projects. But very few mne python functions seem to look at this information. I think that at some point I wrote my own scaling parameters but also the visualization I wanted to end up with was somewhat different from what mne-python does.

choldgraf commented 9 years ago

One tricky thing with ecog scalings is that (IME) ecog comes in many different flavors. I don't know that a single scaling parameter works for all ecog datasets. This is why I usually just forego type-specific scaling and auto scale to 1/2 times the maximum value. Also a reason why it'd be great to incorporate the "interactive" scaling with +/- that was implemented into the epochs plotting

agramfort commented 9 years ago

let's talk during the sprint.

choldgraf commented 9 years ago

sounds good - will touch down in Paris on Tuesday morning!

On Sun, Jul 5, 2015 at 5:21 PM, Alexandre Gramfort <notifications@github.com

wrote:

let's talk during the sprint.

— Reply to this email directly or view it on GitHub https://github.com/mne-tools/mne-python/issues/2253#issuecomment-118671547 .

choldgraf commented 8 years ago

I'm closing this since there's an ecog / seeg data type now...that should be able to handle those data types. I'm still a fan of some kind of "auto" scaling (e.g., check to see if the plot output is going to be total gibberish because the scale is way too big). But maybe it's not worth adding the extra cruft. Let me know if people think it's worth implementing.

kingjr commented 8 years ago

Let me know if people think it's worth implementing.

I think we discussed this elsewhere but yeah I'm +1 on this, although this means loading the whole data in memory, because the intracranial electrode impedance and SNR can vary quite a lot, it's difficult to know in advance what scale we should use.

choldgraf commented 8 years ago

Ah that's a good point - do you think it'll be a problem? If it's an optional parameter, then hopefully it'll be easy for people to just not use if their data takes a long time to load in.

larsoner commented 8 years ago

As long as it's not the default, and documented clearly in the docstring it's okay with me. I suspect the implementation for Raw and Epochs, which can be non-preloaded, will be non-trivial from a memory-saving standpoint.

choldgraf commented 8 years ago

Yeah I'm a bit worried that my data is atypical in terms of memory size since it's ECoG (fewer channels and shorter recording sessions). You don't think it'd be enough to just read in the first 10 or 20s of data and calculate the descriptive statistics on those, assuming that they're representative of the whole dataset? It would probably get it wrong now and then, especially if there are noisy bits in the data, but in that case it'd just slow down in the plotting in the same way that it does now if the scales are off.

kingjr commented 8 years ago

@choldgraf you can read the file in a decimated way, and compute the std on that.

choldgraf commented 8 years ago

Is that a parameter we can give? Or do you just mean read in the file in chunks and update a mean/std iteratively?

kingjr commented 8 years ago

Mmmh, no, I don't think this param exists

choldgraf commented 8 years ago

ok, so you just imagine iterating through chunks of ~5s or something like this and updating a mean/std each time? I still feel like we could get away with a small-ish chunk of data for estimating those values, and assume that they're representative of the whole thing.

kingjr commented 8 years ago

ECoG dataset often contain chunks of high noise, especially early on, so I think this is likely to be inefficient

choldgraf commented 8 years ago

yeah for sure - I was imagining just choosing a random 20 second block in the middle of the data. I just wonder if it's better to just give that a shot because it'd be pretty quick and dirty, and if it seems insufficient then I am happy to make something more complicated. But WDYT?

kingjr commented 8 years ago

Ok, let's start easy.

choldgraf commented 8 years ago

I can throw something together pretty quickly then, I'm basically thinking:

if scale == 'auto':
    # if data is not preloaded, preload 20s of data right in the middle
    # Iterate through channel types
        # For each channel type, calculate the mean +/- standard deviation
    # Define a dictionary of {channel_type: mean + 2*sd}
    # Use that dictionary in plotting

Does that sound reasonable? Think mean + 2*sd would be too much or is that OK?

kingjr commented 8 years ago

2 sd is probably a bit small..

Also, I would use the mad (https://github.com/kingjr/jr-tools/blob/master/jr/stats/base.py#L484), since some channels/time points can go crazy.

Apart from this +1

EDIT: The joke wasn't intended, but I'm happy to copyright it.

choldgraf commented 8 years ago

ah ya - fair enough, I can try to come up with something that's robust to noise etc. will get a PR soonishfully