mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.68k stars 1.31k forks source link

DOC: show example code for custom reader functions (was: Subclassing BaseRaw is too difficult) #12854

Open larsoner opened 2 weeks ago

larsoner commented 2 weeks ago

I directly set this [filenames] attribute [...] because I find setting up a new reader by adding a new class a bit cumbersome. I mostly do the file-specific stuff and then create a RawArray, so maybe adding a public setter in that class might be an option?

Originally posted by @cbrnr in https://github.com/mne-tools/mne-python/issues/12843#issuecomment-2352441225

larsoner commented 2 weeks ago

I find setting up a new reader by adding a new class a bit cumbersome

@cbrnr I opened this to try to figure out if we could make the process easier, but this already works:

import numpy as np
from mne import create_info
from mne.io import BaseRaw

class MyRaw(BaseRaw):

    def __init__(self):
        data = np.zeros((1, 1000))
        info = create_info(1, 1000., 'eeg')
        filenames = ["test.xyz"]
        super().__init__(preload=data, info=info, filenames=filenames)

x = MyRaw()
x.plot()

image

I don't see this as too onerous, and gives you the advantage of nice repr and such

>>> x
<MyRaw | test.xyz, 1 x 1000 (1.0 s), ~13 kB, data loaded>

So I would rather have this little subclass wrapper than add setters for the filenames (which would likely lead to other problems anyway).

cbrnr commented 2 weeks ago

Perfect, that's really great indeed! I'll switch to this approach then for my readers. Thanks!

mscheltienne commented 2 weeks ago

Maybe it could be turned into an example?

cbrnr commented 2 weeks ago

Maybe it could be turned into an example?

I'll see what it'll look like, I have readers for .npz, .mat, and .xdf. The .npz one could be short enough to serve as an example, but we'll see.

cbrnr commented 2 weeks ago

FWIW, what's really missing in the example is a function read_raw_xxx(), because that's what's user-facing. This generates a bit of an overhead and redundancy (and this was the original reason why I took the shortcut), but it's perfectly OK having to implement both a class and a function (or maybe not perfectly OK, but changing the whole reader implementation would not be worth it I guess 🤷).

drammock commented 2 weeks ago

Maybe it could be turned into an example?

exactly what I was thinking; I'll re-open and re-title this issue as an issue about doc improvement.

FWIW, what's really missing in the example is a function read_raw_xxx(), because that's what's user-facing.

Agreed. I think the new/edited example would be most useful if it were something like:

# If no reader exists yet for the kind of file you're working with,
# here's a template function that should get you up and running:

def read_raw_npz(...):
    # load the data (this part will usually be different for each file type)
    # create fake info (details of how the channel names/types are stored in the file may vary)
    # use RawArray to instantiate
    # return RawArray

# (now show it in action)
foo = read_raw_npz(...)

# If you want to contribute your file reader to MNE-Python, it would need a few more pieces,
# like supporting `preload=False`. Here's an expanded reader for `.npz` to illustrate what a
# full-fledged reader function usually looks like:

def read_raw_npz(...):
    ...

# (now show it in action)
foo = read_raw_npz(..., preload=False)
cbrnr commented 2 weeks ago

I'm working on this here: https://github.com/cbrnr/mnelab/pull/434

I think the .npy reader could be used for the example, but I don't think I will have time to add the "few more pieces". I'm not sure if it is really necessary to show everything, because then we might as well add the reader to our codebase, or no?

cbrnr commented 2 weeks ago

Also, in your example I think subclassing BaseRaw is really essential, otherwise there's no way to set the .filenames attribute. This is what brought us here in the first place.

drammock commented 2 weeks ago

I think the .npy reader could be used for the example, but I don't think I will have time to add the "few more pieces". I'm not sure if it is really necessary to show everything, because then we might as well add the reader to our codebase, or no?

I think .npz (or .hdf5 for that matter) are good candidates for "example in our docs, but not in our API" because they're user-customizable formats that aren't guaranteed to follow a proper "spec" in terms of what the different elements are named, their order, etc. So in that sense even building out a "full" reader function for them in the docs only seems fine to me.

Aside: I think .npy is not a good candidate for an example because it can only contain one array (e.g., only the data, not sfreq, ch_names, ch_types, etc; unless you did an object array I guess).

drammock commented 2 weeks ago

Also, in your example I think subclassing BaseRaw is really essential, otherwise there's no way to set the .filenames attribute. This is what brought us here in the first place.

ah, right you are.

cbrnr commented 2 weeks ago

I think .npz (or .hdf5 for that matter) are good candidates for "example in our docs, but not in our API" because they're user-customizable formats that aren't guaranteed to follow a proper "spec" in terms of what the different elements are named, their order, etc. So in that sense even building out a "full" reader function for them in the docs only seems fine to me.

I agree!

Aside: I think .npy is not a good candidate for an example because it can only contain one array (e.g., only the data, not sfreq, ch_names, ch_types, etc; unless you did an object array I guess).

Right, .npz would be a better candidate for an example. I've still implemented a reader for .npy, because it was requested and I guess some people use it because it's so simple to just dump an array into it. Of course, additional information needs to be provided elsewhere (it's in a dialog in my case).