Feature request: Pipeline documentation

christian-oreilly commented 1 year ago

We need to have clear conceptual documentation of the pipeline. Multiple operations are conceptually confusing because they involve operations along multiple axes. To avoid errors and confusion, we need these to be documented.

scott-huberty commented 1 year ago

Agreed. As you know I have started working on documentation with Sphinx, so this is going to be high on my priorities in the coming months. Your input on reasoning about the code (a la #18 ) is very helpful, and as we do this I think we will have an opportunity to make the code base more readable directly as well (more readable function names etc).

christian-oreilly commented 1 year ago

Sounds good @scott-huberty, but in the interest of clarity, I think we are talking about two overlapping but different things. I think that what you have in mind is closer to code documentation (proper docstrings and such; and using these docstrings through Sphinx to generate API doc)? What I plan to address here is more of a "conceptual" documentation than a "code" documentation. In the end, Sphinx will certainly be able to digest and integrate this in some ways, but more specifically, the kind of doc I plan to prepare is: 1 - Overall diagram of the pipeline, with distinct naming and numbering of steps; 2 - Break down of each steps with, if useful, visual explanation.

I think that will help, among other thing, solving issues like #18

scott-huberty commented 1 year ago

OK I see - and yes I think this is a good idea, happy to help once you've started a doc that can be shared.

christian-oreilly commented 1 year ago

First step would look something like that:

christian-oreilly commented 1 year ago

Second step would look something like this...

christian-oreilly commented 1 year ago

2 done, ~10 to go... ! ;)

scott-huberty commented 1 year ago

Those look great!

What are you using to generate these graphs?

christian-oreilly commented 1 year ago

Inkscape! Two key open source imaging softwares: gimp (eq. to photoshops) for bitmaps, inkscape for vectorials (eq. to illustrator).

christian-oreilly commented 1 year ago

Developed in the "xarray" branch.

christian-oreilly commented 1 year ago

2.a Computing standard deviation across epochs.
2.b Z^_i is defined as Z^i(X) = \frac{X - Q{50,i}(X)}{Q{70,i}(x) - Q{30,i}(x)}

with Q_{j,i}(X) standing for the j-th percentile of X computed along the dimension i. At this step, this operation is broacasted, such that it does not results in a collapsing of the dimension channel even though the statistics are computed along this dimensions. These values are akin to a non-parametric z-score indicating how much a channel has a large variability (across epochs) compared to the other channels, at a given time point. Note that the use of both the time and epoch dimensions makes sense for event-related epochs, not so much for epochs of resting-state data.

2.c. Averaging the previously computed z-scores across time.
2.d Flagging bad channels as those with such z-scores larger than the median plus 6 times the difference between the 70th and the 30th percentile.

I am somehow unclear that even such documentation helps make this operation more digestible.

scott-huberty commented 1 year ago

I love the graphic
When calculating variability across epochs, A channel's voltage value at a given sample (I.e sample 10) is compared across epochs?

christian-oreilly commented 1 year ago

Thanks.
If by "sample 10" you mean the tenth sample along the time dimension, then yes. Think of xarrays... with an xarray X with dimensions (channel, epoch, time), this is equivalent to X.std("epoch")

scott-huberty commented 1 year ago

Yeah exactly. I should have said something like "the 10th index of the time dimension" across epochs.

Well, I think this documentation does help to convey the operation.

christian-oreilly commented 1 year ago

Yeah, I guess. This one is an intricate measure so I guess it would be hard to make it more transparent.

scott-huberty commented 1 year ago

@christian-oreilly we could adapt this code to plot a cube, label the axes (x=time,y=epoch, z=channel), and format one row of voxels (across epochs axis) to be a different color (red) than the rest of the cube (gray?), to help convey the concept of comparing the same channel and time index across epochs?

https://matplotlib.org/devdocs/gallery/mplot3d/voxels_numpy_logo.html

christian-oreilly commented 1 year ago

Yeah... I don't want to spend forever working on these diagrams though. It seems to me that coding these instead of just drawing them will not make this process faster (on the contrary). I think, personally, I prefer to work on these manually in Inkscape. I can revisit this one down the line to add the coloring you propose, but I'd first finish representing the other steps of the pipeline. Also, the coloring, for now, has been used to indicate flagged values, so it would need to made in a way that does not confuse these two things.

christian-oreilly commented 1 year ago

Numbers of channels as per the config.
Formula for outlier rejection as per the config.

scott-huberty commented 1 year ago

Looks great.

It's been since my masters since I used Inkscape but I think I'll download it again. In the spirit of open source, would you be able to share these Inkscape files with me 🙂

christian-oreilly commented 1 year ago

It is a good too to be comfortable using for figures in paper and stuff like that. I create SVG files, which is a very standard format.

Rank and bridge steps will be very easy to sketch (I have them on paper). A bit too lazy to do them tonight, but it should be done shortly. There would just be the gap removal and the large IC step remaining. I don't think ICA are worth depicting... it is as per the ICA algorithm... but I'll see. Anyway, short story, this is almost done. I'll glue them together with markdown text to have a nice complete explanation of the pipeline.

christian-oreilly commented 1 year ago

Finale set of schemas.

All schemas are now on the xarray branch. A pipeline.md display them all. The last steps is to add some text to pipeline.md to explain it all.

scott-huberty commented 1 year ago

Great. Once you are ready to merge these schemas into the main branch, I can look into incorporating them to the sphinx documentation.

christian-oreilly commented 1 year ago

Any objection on closing this issue? Improving the documentation and filling all the TODOs will be an ongoing task, but I think this issue has been sufficiently addressed to close it instead of keeping it as an issue that we will go stale as the documentation will always be perfectible.

scott-huberty commented 1 year ago

Close it!

lina-usc / pylossless

Feature request: Pipeline documentation #19