Closed christian-oreilly closed 1 year ago
Agreed. As you know I have started working on documentation with Sphinx, so this is going to be high on my priorities in the coming months. Your input on reasoning about the code (a la #18 ) is very helpful, and as we do this I think we will have an opportunity to make the code base more readable directly as well (more readable function names etc).
Sounds good @scott-huberty, but in the interest of clarity, I think we are talking about two overlapping but different things. I think that what you have in mind is closer to code documentation (proper docstrings and such; and using these docstrings through Sphinx to generate API doc)? What I plan to address here is more of a "conceptual" documentation than a "code" documentation. In the end, Sphinx will certainly be able to digest and integrate this in some ways, but more specifically, the kind of doc I plan to prepare is: 1 - Overall diagram of the pipeline, with distinct naming and numbering of steps; 2 - Break down of each steps with, if useful, visual explanation.
I think that will help, among other thing, solving issues like #18
OK I see - and yes I think this is a good idea, happy to help once you've started a doc that can be shared.
First step would look something like that:
Second step would look something like this...
2 done, ~10 to go... ! ;)
Those look great!
What are you using to generate these graphs?
Inkscape! Two key open source imaging softwares: gimp (eq. to photoshops) for bitmaps, inkscape for vectorials (eq. to illustrator).
Developed in the "xarray" branch.
2.a Computing standard deviation across epochs.
2.b Z^_i is defined as Z^i(X) = \frac{X - Q{50,i}(X)}{Q{70,i}(x) - Q{30,i}(x)}
with Q_{j,i}(X) standing for the j-th percentile of X computed along the dimension i. At this step, this operation is broacasted, such that it does not results in a collapsing of the dimension channel even though the statistics are computed along this dimensions. These values are akin to a non-parametric z-score indicating how much a channel has a large variability (across epochs) compared to the other channels, at a given time point. Note that the use of both the time and epoch dimensions makes sense for event-related epochs, not so much for epochs of resting-state data.
I am somehow unclear that even such documentation helps make this operation more digestible.
I love the graphic
When calculating variability across epochs, A channel's voltage value at a given sample (I.e sample 10) is compared across epochs?
Thanks.
If by "sample 10" you mean the tenth sample along the time dimension, then yes. Think of xarrays... with an xarray X with dimensions (channel, epoch, time), this is equivalent to X.std("epoch")
Yeah exactly. I should have said something like "the 10th index of the time dimension" across epochs.
Well, I think this documentation does help to convey the operation.
Yeah, I guess. This one is an intricate measure so I guess it would be hard to make it more transparent.
@christian-oreilly we could adapt this code to plot a cube, label the axes (x=time,y=epoch, z=channel), and format one row of voxels (across epochs axis) to be a different color (red) than the rest of the cube (gray?), to help convey the concept of comparing the same channel and time index across epochs?
https://matplotlib.org/devdocs/gallery/mplot3d/voxels_numpy_logo.html
Yeah... I don't want to spend forever working on these diagrams though. It seems to me that coding these instead of just drawing them will not make this process faster (on the contrary). I think, personally, I prefer to work on these manually in Inkscape. I can revisit this one down the line to add the coloring you propose, but I'd first finish representing the other steps of the pipeline. Also, the coloring, for now, has been used to indicate flagged values, so it would need to made in a way that does not confuse these two things.
Looks great.
It's been since my masters since I used Inkscape but I think I'll download it again. In the spirit of open source, would you be able to share these Inkscape files with me 🙂
It is a good too to be comfortable using for figures in paper and stuff like that. I create SVG files, which is a very standard format.
Rank and bridge steps will be very easy to sketch (I have them on paper). A bit too lazy to do them tonight, but it should be done shortly. There would just be the gap removal and the large IC step remaining. I don't think ICA are worth depicting... it is as per the ICA algorithm... but I'll see. Anyway, short story, this is almost done. I'll glue them together with markdown text to have a nice complete explanation of the pipeline.
Finale set of schemas.
All schemas are now on the xarray branch. A pipeline.md display them all. The last steps is to add some text to pipeline.md to explain it all.
Great. Once you are ready to merge these schemas into the main branch, I can look into incorporating them to the sphinx documentation.
Any objection on closing this issue? Improving the documentation and filling all the TODOs will be an ongoing task, but I think this issue has been sufficiently addressed to close it instead of keeping it as an issue that we will go stale as the documentation will always be perfectible.
Close it!
We need to have clear conceptual documentation of the pipeline. Multiple operations are conceptually confusing because they involve operations along multiple axes. To avoid errors and confusion, we need these to be documented.