Open hoechenberger opened 2 years ago
and how's this blurry thing we're creating in any way related to the neurophysiological recordings I'm wanting to analyze?
True, I have tripped over that as well.
Could we collapse code like the one you show by default? That way, non-essential code (which is needed to create toy data etc. but not to do statistics) would be hidden and people could focus on the essential things.
Could we collapse code like the one you show by default? That way, non-essential code (which is needed to create toy data etc. but not to do statistics) would be hidden and people could focus on the essential things.
I had thought about this too, but I'm not sure if it would really help in this particular case, as the toy data for example is to me in no way related to electrophysiological data, so hiding its generation would maybe make things even worse … I wonder if a first step could be to pick a different set of / different approach to generating the example data.
Could we maybe not simply load sample
and add some random offsets or something to generate "participants"? Something like that. Electrophysiological time series data. And ideally no data that has a square shape, because then I'm always getting confused about which dimension is which, esp. once we extract the data as a NumPy array (we should offer xarray support so dimensions are properly labeled, but this is another discussion)
I actually really like this tutorial. It uses abstract data to make the points for sure, but it allows us to visually show what we're talking about. With real data this would not be that straightforward (if at all visible). Maybe we could add to the main text the correspondence of our toy data to real (EEG) data? That way, people could make the connection from the abstract toy example to their own data?
Maybe we could add to the main text the correspondence of our toy data to real (EEG) data? That way, people could make the connection from the abstract toy example to their own data?
I'm not sure I understand what you mean exactly, could you please give an example?
Maybe something like: the blob visualized in 2D corresponds to EEG channels? I don't even know if that's true TBH, but something along those lines.
And I cannot relate to the data being used at all.
To me this is a narrative problem, we should describe why the data are created the way that they are -- what it accomplishes, and how it relates to real data. I think it can be done in a couple of sentences.
I wonder if a first step could be to pick a different set of / different approach to generating the example data.
I would rather not. To me this is actually the simplest example that actually demonstrates the core ideas. It's also taken (almost?) directly from a paper IIRC, which should be cited in the example already (we should add it if it's not).
Could we maybe not simply load sample and add some random offsets or something to generate "participants"? Something like that. Electrophysiological time series data.
Anything using real data will actually end up comparatively more complicated and less clear I think.
And ideally no data that has a square shape, because then I'm always getting confused about which dimension is which, esp. once we extract the data as a NumPy array (we should offer xarray support so dimensions are properly labeled, but this is another discussion)
The idea of the example is to provide an abstraction -- the X and Y dimensions could really be anything (time, space, frequency, etc.). The principles generalize to real data along any dimensions once you understand the idea. I think we need to convey this part more clearly
https://mne.tools/stable/auto_tutorials/stats-sensor-space/10_background_stats.html
Don't get me wrong, the content itself is great, but to me, the "important bits" get totally hidden behind a wall of code blocks that do complex visualizations and fake data generation, so anytime I look at this thing, I'm struggling to find the actually relevant parts – those that show me how to do statistics! Meaning, at the end of the day, the tutorial isn't very helpful at all, and this is a pity!
And I cannot relate to the data being used at all.
For example, this is the data:
What does that even mean? I can't remember the last time I used
np.convolve
manually, and how's this blurry thing we're creating in any way related to the neurophysiological recordings I'm wanting to analyze?If anybody has any ideas on how to make this tutorial more approachable to ordinary users, it would be greatly appreciated!
cc @sappelhoff