mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.67k stars 1.31k forks source link

add_channels() documentation is difficult to understand #11106

Open hoechenberger opened 2 years ago

hoechenberger commented 2 years ago

I just wanted to use Raw.add_channels(), so I looked at the API docs and even though I'm a developer, it actually took me a couple of seconds to even understand what could probably be meant, and still I think I'll have to take a look at the code to get a better understanding:

Screen Shot 2022-08-28 at 17 00 00

I believe we're making bad use of the docdict here, avoiding redundancy but making the docs effectively useless because they're too weird and hard to understand.

Didn't we recently have a somewhat similar case where we ended up with a way to improve things?

agramfort commented 2 years ago

Indeed. Can you send a PR to improve?

hoechenberger commented 2 years ago

Oh wow only after actually looking at the code I realize I totally misunderstood what this method is even supposed to do, based on what I expected from the method name and its docstring.

The docstring says,

Append new channels to the instance.

So what I thought I could do is, just pass an array of data for additional channels (and channel names), and they will be appended to the existing instance.

But actually what this method is supposed to do is in fact a concatenation of multiple objects of the same type, right?

But we already have .append() and concatenate_*() for this. The only issue with these is that they currently only work on the time domain. But if we added a switch to allow for concatenation on the channel domain as an alternative way of operation, we wouldn't even need .add_channels() anymore.

I guess I'm thinking sort of like in terms of pd.concat(), which has an axis parameter.

Any thoughts here?

agramfort commented 2 years ago

it's adding channels so pd.concat(..., axis=channels)

it needs to be more than channel name and array as a channel info["chs"] is a lot more than this.

Message ID: @.***>

hoechenberger commented 2 years ago

it needs to be more than channel name and array as a channel info["chs"] is a lot more than this.

Exactly! That's why naming the method add_channels() is misleading – effectively, you're concatenating multiple "complete" objects.

Look just how much code I had to produce to add two channels to an existing raw:

# Add GSR and temperature channels
gsr_data = np.array([2.1e-6] * len(raw.times))
temperature_data = np.array([36.5] * len(raw.times))

gsr_and_temp_data = np.concatenate([
    np.atleast_2d(gsr_data),
    np.atleast_2d(temperature_data),
])
gsr_and_temp_info = mne.create_info(
    ch_names=["GSR", "Temperature"],
    sfreq=raw.info["sfreq"],
    ch_types=["gsr", "temperature"],
)
gsr_and_temp_info["line_freq"] = raw.info["line_freq"]
gsr_and_temp_info["subject_info"] = raw.info["subject_info"]
with gsr_and_temp_info._unlock():
    gsr_and_temp_info["lowpass"] = raw.info["lowpass"]
    gsr_and_temp_info["highpass"] = raw.info["highpass"]
gsr_and_temp_raw = mne.io.RawArray(
    data=gsr_and_temp_data,
    info=gsr_and_temp_info,
    first_samp=raw.first_samp,
)
raw.add_channels([gsr_and_temp_raw])
del gsr_and

https://github.com/mne-tools/mne-bids/blob/036dc7e08c5053e218f4aeff7c1786ffbdf0a6b4/mne_bids/tests/data/tiny_bids/code/make_tiny_bids_dataset.py#L46-L69

agramfort commented 2 years ago

I am not sure what you suggest here.

Message ID: @.***>

sappelhoff commented 2 years ago

+1 to improve the docs based on the 4 points in the OP

once that is clarified, I think it's easier to live with the fact that add_channels actually concatenates raws (or other instances).

I am not sure what you suggest here.

not sure either, but perhaps that add_channels would be a function that takes params like data, ch_name, ... (and all else that is needed) instead of a raw/... object?

drammock commented 2 years ago

Look just how much code I had to produce to add two channels to an existing raw:

Agree that this is way too difficult. Comments below:

# Add GSR and temperature channels
gsr_data = np.array([2.1e-6] * len(raw.times))
temperature_data = np.array([36.5] * len(raw.times))
gsr_and_temp_data = np.concatenate([
    np.atleast_2d(gsr_data),
    np.atleast_2d(temperature_data),
])

The above could be simplified a bit; still 3 expressions but much shorter:

gsr = np.full((1, len(raw.times)), 2.1e-6)
temp = np.full((1, len(raw.times)), 36.5)
gsr_and_temp_data = np.vstack([gsr, temp])

This next bit seems unaviodable but not so bad:

gsr_and_temp_info = mne.create_info(
    ch_names=["GSR", "Temperature"],
    sfreq=raw.info["sfreq"],
    ch_types=["gsr", "temperature"],
)

Are the below lines necessary when creating RawArray, or only when adding the channels to the existing Raw? (I suspect the latter, but haven't checked). Regardless, we should make it possible to skip line_freq, lowpass, highpass, and subject_info for cases like this --- perhaps by having "wildcard" defaults that will automatically take on the values of the Raw they are concatenated with? This is (I think) what force_update_info does already: uses the existing Raw's info to update/override the info of the concatenated objects. But it would be nice if that were possible without having to first explicitly set values for those info fields.

gsr_and_temp_info["line_freq"] = raw.info["line_freq"]
gsr_and_temp_info["subject_info"] = raw.info["subject_info"]
with gsr_and_temp_info._unlock():
    gsr_and_temp_info["lowpass"] = raw.info["lowpass"]
    gsr_and_temp_info["highpass"] = raw.info["highpass"]

The rest seems unavoidable, but if the two simplifications above were made, I think we're down to something reasonable/acceptable.

gsr_and_temp_raw = mne.io.RawArray(
    data=gsr_and_temp_data,
    info=gsr_and_temp_info,
    first_samp=raw.first_samp,
)
raw.add_channels([gsr_and_temp_raw])

As for the intuitiveness of the API name... I think I would agree that something like raw.add_channels(data, ch_name, ch_type, ...) ought to work too. I can imagine an API where data can be a NumPy array (in which case you must pass other args), or alternatively data can be a list of Raw instances (or epochs or evoked).

hoechenberger commented 2 years ago

I am not sure what you suggest here.

not sure either, but perhaps that add_channels would be a function that takes params like data, ch_name, ... (and all else that is needed) instead of a raw/... object?

I am not sure myself either 😆

But I suppose my feeling was that:

Thanks @drammock for the simplifications, that's helpful!

agramfort commented 2 years ago

I would suggest to follow what @drammock https://github.com/drammock suggests.

the rest well, if someone has time to iterate but for me it's a lot of work for a small return.

this operation was necessary to hack a test in mne-bids not to actually process data so I don't expect people to do these operations all the time.

Message ID: @.***>

larsoner commented 2 years ago

As for the intuitiveness of the API name... I think I would agree that something like raw.add_channels(data, ch_name, ch_type, ...) ought to work too.

I would rather not add this. To me it's replicating RawArray without much gain. I don't think this extra line is worth the API duplication.

Can we just live with a documentation improvement for now?

hoechenberger commented 2 years ago

Can we just live with a documentation improvement for now?

Yep, let's do that!

hoechenberger commented 2 years ago

Hey! who said I'd be volunteering? 😅😅😅😅

larsoner commented 2 years ago

I believe @agramfort calls this being "volun-told" :)

More seriously if you won't have time in the next few days @drammock or I could give it a shot. But I think you're the best equipped having most recently suffered through the doc badness :)

larsoner commented 2 months ago

Milestone on this has been bumped six times, removing milestone until someone champions this issue. But also marking as "easy" so it hopefully gets visibility / higher likelihood of being tackled