sappelhoff / pyprep

A Python implementation of the Preprocessing Pipeline (PREP) for EEG data
https://pyprep.readthedocs.io/en/latest/
MIT License
128 stars 30 forks source link

[WIP] Split PrepPipeline into separate methods, make final interpolation optional #99

Open a-hurst opened 3 years ago

a-hurst commented 3 years ago

PR Description

(Eventually) closes #73. Currently, this PR:

There's still definitely more cleanup to do, but I figured it'd be good to get the core of this up for review sooner than later!

Merge Checklist

a-hurst commented 3 years ago

Whoops, looks like one of the examples relies on an undocumented attribute I removed for the sake of RAM (self.EEG_new). Will address that tomorrow.

codecov-commenter commented 3 years ago

Codecov Report

Merging #99 (0688a5c) into master (ec44384) will decrease coverage by 0.35%. The diff coverage is 95.34%.

:exclamation: Current head 0688a5c differs from pull request most recent head 6ff2755. Consider uploading reports for the commit 6ff2755 to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master      #99      +/-   ##
==========================================
- Coverage   99.04%   98.68%   -0.36%     
==========================================
  Files           7        7              
  Lines         734      762      +28     
==========================================
+ Hits          727      752      +25     
- Misses          7       10       +3     
Impacted Files Coverage Ξ”
pyprep/reference.py 97.88% <90.00%> (-1.30%) :arrow_down:
pyprep/prep_pipeline.py 98.59% <100.00%> (-1.41%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data Powered by Codecov. Last update ec44384...6ff2755. Read the comment docs.

sappelhoff commented 3 years ago

It is possible that I introduced these conflicts :confounded: it's because I worked on dbe2062d3c7d50a0d87775e1111062ef854c366f before pulling master ... the pulled master, rebased, and force pushed. That was a bad idea - I hope nothing serious got destroyed.

a-hurst commented 3 years ago

It is possible that I introduced these conflicts πŸ˜– it's because I worked on dbe2062 before pulling master ... the pulled master, rebased, and force pushed. That was a bad idea - I hope nothing serious got destroyed.

No worries, I'll rebase before I push any more work! I've set this aside for a few days while tackling some other projects, but I'll try and get this wrapped up within the next week or so :)

a-hurst commented 2 years ago

Okay @sappelhoff, I've gotten the PrepPipeline API rewrite to a point I think I'm happy with (though I'd love to hear your thoughts and @yjmantilla's). Some of the attributes aren't documented yet and much of it is untested, but I figured it'd be best to handle that side of things once I was sure we were all happy with the changes.

My main changes here are:

Let me know what you think!

sappelhoff commented 2 years ago

I didn't take a look at the diff because I am not sure what diff will stay after https://github.com/sappelhoff/pyprep/pull/99#discussion_r683235569 is resolved, but I still have concerns over these two points (I am commenting only based on your summary comment):

returns reference_before_interpolation if interpolation hasn't been done, or the post-interpolation average reference (previously reference_after_interpolation) if it has

and

(will retrieve post-interpolation values if interpolation was used, otherwise returns the pre-interpolation values

Could we not just return a dict in both cases, with the keys pre_interpolation and post_interpolation ... and the values for these keys being None or empty dicts if these values have not yet been computed? I think that'd be clearer. An alternative would be to send a log message upon the getter call that says something like: "you are getting the post_interpolation values". The minimum would be to have an attribute that can clearly tell you whether the prep object is "pre" or "post" interpolation, but we might already have something like that, not sure right now.

What do you think?

a-hurst commented 2 years ago

Could we not just return a dict in both cases, with the keys pre_interpolation and post_interpolation ... and the values for these keys being None or empty dicts if these values have not yet been computed? I think that'd be clearer. An alternative would be to send a log message upon the getter call that says something like: "you are getting the post_interpolation values". The minimum would be to have an attribute that can clearly tell you whether the prep object is "pre" or "post" interpolation, but we might already have something like that, not sure right now.

That could work, though I still think it'd be useful to have an attribute like current_reference_signal so there's a consistent way to get the final reference signal regardless if interpolation was enabled or disabled. For the last project we did with PyPREP we tried the analysis pipeline both with and without final bad channel interpolation, so I think that having an API where you can toggle interpolation on and off without needing to change any attributes you access afterwards would be handy.

Instead of attributes, maybe methods in the style of the new get_raw function would be clearer? For example, a get_noisy_info method that returns the most recent noisy info available by default (like current_noisy_info does now), but also has an argument that lets you specify explicitly whether you want the original, post-reference, or post-interpolation noisy info.

Keep in mind that for the PrepPipeline API I decided against exposing interpolation as a separate method and instead made it an optional flag to robust_reference(), so it should hopefully be clear to users based on the settings they chose (and the documentation) whether the "current reference signal" reflects interpolated data or not.

a-hurst commented 2 years ago

yes! That sounds good to me, I like the new get_raw πŸ‘ But I think the default should be "current", not "None", to be more explicit.

Great! I'll get to work on this, as well as proper tests and docs for everything new. In that case, for the sake of simplicity I'm going to leave the internal noisy info and reference signal attributes undocumented so that the new get_x methods are the one clear official way for users to get at that data.

I approve of the remaining changes, but I am slowly losing my grasp on the potential workflows we can have with our different classes and their methods πŸ˜‡ it's been too long that I actually worked with this code, I think.

Hopefully cleaning up the example scripts for 0.4 will refresh us all on that front!