Open cbrnr opened 2 years ago
I love the idea of adding such a function!
- Where should this function live? I'd put it in
mne.misc
and export it tomne
, but there might be a better place.
IMHO a misc
and utils
namespace shouldn't even exist; it's only an invitation to dump arbitrary stuff there without ever cleaning it up; and the naming is not explicit, therefore not helping users either.
The mne
namespace is overloaded already, I'd avoid exposing it there.
Why not simply add a new module & namespace, mne.toydata
? Or even mne.datasets.toydata
, idk?
- The recommended method for generating random numbers is
numpy.random.default_rng(seed)
, but the seed is not compatible with how we are dealing with random state (check_random_state()
). If the function gets aseed
orrandom_state
parameter, how should we handle this?
check_random_state()
already handles RandomState
instances, doesn't that suffice?
Some thoughts off-the-cuff:
To me this seems very close to what you would get with add_noise(raw, cov)
for an empty raw and cov=mne.make_ad_hoc_cov(...)
after some suitable call to mne.simulation.simulate_raw
.
I'd rather extend these existing functions than add anything new
I'm fine with extending existing functions, and the mne.simulation
module seems like a good place for that functionality. Initially I thought because there is mne.create_info()
, an mne.create_toy_data()
would be the expected/consistent place.
check_random_state()
already handlesRandomState
instances, doesn't that suffice?
But a numpy.random.Generator
(which is returned by numpy.random.default_rng()
) does not support a RandomState
instance unfortunately.
I'm hesitant to make such a thing public because it invites a lot of user bikeshedding based on divergent use cases
That's why I'd intentionally keep it simple.
Re extending mne.simulation.simulate_raw()
, I like the idea, but the point of my proposed function is that you can just call it without setting any parameters, and you get some reasonable toy data. Not sure how this could be handled, do you mean we could add my proposed function there and just call already existing functions? Or do you really mean you'd rather not add any new function?
Or do you really mean you'd rather not add any new function?
I'd rather not add any new function. What you propose seems 90%+ like an existing function. No need to make something new to do almost the same thing just to save someone having to set one or two parameters of the function
I'd rather not add any new function. What you propose seems 90%+ like an existing function. No need to make something new to do almost the same thing just to save someone having to set one or two parameters of the function
But what you proposed consists of several function calls or no? I agree that mne.simulation.simulate_raw()
should be able to do what I want (actually, I don't even care if the data is EEG-like or random, I only need the right data structure with a given length and number of channels). I'm not sure if it is actually easier to adapt mne.simulation.simulate_raw()
to generate toy data without needing to specify STCs and whatnot than to add a small new function (maybe mne.simulation.toy_raw()
?).
Could we make the simulate_xxx method all have default arguments to do the job ?
I'm not sure if it is a good idea to coerce simulate_raw()
into not simulating and instead outputting some random data (with desired shape). I still think a separate function in that module would be the best solution. My latest name idea is generate_raw()
. Or create_toy_raw()
. The "toy" part is probably important to show that the signal is not EEG or something plausible.
And it would fit in either misc
or simulation
. Even data
or utils
would be possible IMO.
Scikit-learn has datasets.make_*()
for this purpose BTW.
Scikit-learn has
datasets.make_*()
for this purpose BTW.
I like this. This or simulation
.
To me I think we should still just make our existing functions better -- so far what you've described @cbrnr is in my mind just a 2- or 3-line wrapper around existing functions. There are lots of potential ways to make our existing functions easier to use.
For example maybe support data=<int>
in RawArray to mean "give me an array of zeros for all channels of this many samples". Then your use case is
info = create_info(...) # I think in any API you're going to need this line
raw = mne.simulation.add_noise(RawArray(10000, info), ...)
The bonus of this API is you can do things like
epochs = mne.simulation.add_noise(EvokedArray(10000, info), ...)
etc. immediately because we already have these other classes, and add_noise
knows how to deal with them.
I thought you meant extending mne.simulation.simulate_raw()
. I like adapting mne.io.RawArray
to give an empty array, but how would you handle this with mne.EpochsArray
? Pass a 2D array with (n_epochs, n_channels)
?
I think it depends on how many people would use this functionality. Many things could be done with existing functions to a certain degree, but at some point it might make sense to put it into a dedicated function.
I'd like to revive this issue. A function to create some toy data (not simulated data) would be extremely useful for me (and probably others), because I need this in almost any MWE. And as @larsoner said, of course it is just a wrapper around existing functions, but not 2 or 3 lines, but 7 lines at least (you need the imports).
I think there was at least some consensus for mne.datasets.make_toy_*()
?
would you start to make use of this function in our tests? maybe it would be a concrete opportunity to use this and reduce also our number of lines?
Message ID: @.***>
would you start to make use of this function in our tests? maybe it would be a concrete opportunity to use this and reduce also our number of lines?
Yes, this would very likely lead to shorter tests. I'll have to investigate a little, but I cannot do it right now. I just wanted to make sure that there is still interest, or if we can close this issue.
I'm interested in seeing something like this happen. My use case is mostly for MWEs though: i.e., when debugging user problems from the forum or demonstrating how to do things. I want to avoid having to write
sample_data_folder = mne.datasets.sample.data_path()
sample_data_raw_file = sample_data_folder / "MEG" / "sample" / "sample_audvis_raw.fif"
raw = mne.io.read_raw_fif(sample_data_raw_file, verbose=False, preload=False)
just to try out what the user says isn't working. Other times random data is good enough too. So I think my ideal would be something like:
mne.simulation.example(
kind: str = "raw", # can add "epochs", "evoked", "spectrum", "stc", "tfr" if needed
data: str = "random", # or "sample" to use sample dataset, add other dataset wrappers if needed
info: Info|None = None, # if data="random" you can provide an info if you don't like built-in defaults
)
name of the function doesn't matter much to me; could be example
or make_example
or make_example_data
or make_toy_data
or fake_object
or whatever.
FWIW I don't actually expect this to help all that much in our test suite, since nowadays we have fixtures for raw, epochs, evoked, spectrum, and (I think?) stc. There will be a some tests that might benefit from this, but then again it might also be possible to update them to use the fixtures instead.
My use case is mostly for MWEs though
This was also my initial motivation for adding such a function, but then the discussion was mainly about extending functionality of available functions.
I often find myself generating toy data (e.g. for educational or testing purposes), so I thought a dedicated function might be useful.
It should be as simple as possible, for example:
It is important that there are sensible defaults for all parameters, which makes it possible to generate toy data very quickly:
If people think this would be useful, I can go ahead and submit a PR.
Of course, this function could have a lot of additional parameters, such as
However, I'd say YAGNI until someone really needs a particular feature.
If there is interest, I have two questions:
mne.misc
and export it tomne
, but there might be a better place.numpy.random.default_rng(seed)
, but the seed is not compatible with how we are dealing with random state (check_random_state()
). If the function gets aseed
orrandom_state
parameter, how should we handle this?So – yay or nay?