lina-usc / pylossless

🧠 EEG Processing pipeline that annotates continuous data
https://pylossless.readthedocs.io/en/latest/
MIT License
15 stars 8 forks source link

FIX: Make sure annotations dont overlap with adjacent epochs #132

Closed scott-huberty closed 1 year ago

scott-huberty commented 1 year ago

Fixes #124

I remember something about how mne epochs include the end sample when they shouldn't but I want to make sure this issue was only in how the onset and duration in the annotations were defined and not also due to how we create epochs.

codecov[bot] commented 1 year ago

Codecov Report

Merging #132 (6f586f9) into main (4db5348) will increase coverage by 55.03%. The diff coverage is 96.96%.

@@             Coverage Diff             @@
##             main     #132       +/-   ##
===========================================
+ Coverage   22.65%   77.69%   +55.03%     
===========================================
  Files           7       12        +5     
  Lines         565      919      +354     
===========================================
+ Hits          128      714      +586     
+ Misses        437      205      -232     
Impacted Files Coverage Ξ”
pylossless/dash/topo_viz.py 83.69% <91.17%> (ΓΈ)
pylossless/_utils.py 100.00% <100.00%> (+73.33%) :arrow_up:
pylossless/dash/tests/conftest.py 100.00% <100.00%> (ΓΈ)
pylossless/dash/tests/test_topo_viz.py 100.00% <100.00%> (ΓΈ)
pylossless/flagging.py 64.28% <100.00%> (+33.73%) :arrow_up:
pylossless/pipeline.py 72.95% <100.00%> (+54.37%) :arrow_up:

... and 4 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

scott-huberty commented 1 year ago

Merging #132 (91ca7c2) into main (4db5348) will increase coverage by 54.84%.

πŸ˜‚ it's like codecov wasn't capturing some of our tests until now?

scott-huberty commented 1 year ago

@christian-oreilly i never thought I'd be the one to say this but I'm starting to wonder if we should be handling epoch exclusion "in house" (i.e not relying on mne to drop "bad" annotation epochs).

I don't generally like re-creating the wheel where MNE already has a convenience function for what we need to do, but this issue where the bad annotations are hitting adjacent epochs is really bothering me.

Maybe we can just always pass all the epochs to epochs_to_xr and then each of our pipeline methods can do something like xr_epochs.sel(~flagged_epoch_inds).

The nice thing is that this way the indices of the epochs are always the same throughout the pipeline, I.e epoch indice 10 is always 10-11sec (I'm not sure that this is true when epochs are dropped by bad but I could be wrong).

Of course this doesn't solve the root issue, I.e if the user decides to convert pylossless annotations to bad annotations, then they will face the same issue we are now, where a single bad_pylossless annotatjon drops 3 epochs when it should only drop one.

christian-oreilly commented 1 year ago

πŸ˜‚ it's like codecov wasn't capturing some of our tests until now?

Look at the files that caused the diff. This is due to new tests added in PR #117.

I don't think the modifications of adding/removing an arbitrary number of samples (2, 4) should be adopted, and I would be reticent regarding the idea of implementing a custom mechanism to discard epochs. If we were too, I think it would require first documenting the limit of the MNE approach and why it is not better to fix it. I think this thing requires more thorough investigation. Particularly, we should build a (or a set of) unit test(s) that check for the limit cases. We can have a look at that on Friday if you'd like.

christian-oreilly commented 1 year ago

OK... I gave it some more thought. My previous response was more or less guided by the gut feeling that we should not be reinventing the wheel. Then I thought that there might be something to the idea of a clean xarray with preserved indices and all... then I thought, yes but if we do that, then we are preprocessing epochs. One of the key arguments for the lossless pipeline is that it does not require epoching the data, but rather operates on the original raw. We use the "Epochs" class for preprocessing in intermediate steps, but this is just for convenience... we are not epoching in the same sense as would ERP. We are operating on windows (like you would do with a rolling mean) and the length of these windows is more or less arbitrary (as opposed to an event-related paradigm where it depends on ISI). Further, these are only a temporary state meant to create annotations in that continuous (raw) data. The output of the pipeline is a raw file, not epochs. And decisions to keep or reject are on the raw (as you alluded to). So, I think this idea is fraught. I think that something closer to a solution (which I am not sure I advise at this time!) would be reimplementing the rejection of epochs based on annotations (i.e., when coding PyLossless methods to apply the cleaning policy, there could be something like a tolerance factor (let's say in seconds... with a default like 0.01) that specify how much a bad annotation can overlap with an epoch before it causes its rejection). A similar idea is to do more or less that, but instead of reimplementing it in PyLossless, it can be done as a PR to MNE proposing a new argument (for that rejection tolerance factor) to Epochs.

scott-huberty commented 1 year ago

πŸ˜‚ it's like codecov wasn't capturing some of our tests until now?

Look at the files that caused the diff. This is due to new tests added in PR #117.

I don't think the modifications of adding/removing an arbitrary number of samples (2, 4) should be adopted, and I would be reticent regarding the idea of implementing a custom mechanism to discard epochs. If we were too, I think it would require first documenting the limit of the MNE approach and why it is not better to fix it. I think this thing requires more thorough investigation. Particularly, we should build a (or a set of) unit test(s) that check for the limit cases. We can have a look at that on Friday if you'd like.

Sure, both james and tyler are using this. I consider this to be high priority.

I also have hesitations about manually managing bad epochs but I also think we are in-between a standard (playing both sides of the fence), where we apparently take a strong stance against letting MNE store flagged channels and flagged ICs but only want MNE to handle storing/handling bad epochs.

I understand the reasons why we made those decisions but It makes things unclear for end-users IMO. via visualization they get to see the pipeline's decisions on epochs by default (but not channels or ICs).

christian-oreilly commented 1 year ago

Didn't we had a discussion where we considered removing the "bad" prefix to these annotations for this exact reason?

On Tue, Jun 13, 2023, 08:05 Scott Huberty @.***> wrote:

πŸ˜‚ it's like codecov wasn't capturing some of our tests until now?

Look at the files that caused the diff. This is due to new tests added in PR #117 https://github.com/lina-usc/pylossless/pull/117.

I don't think the modifications of adding/removing an arbitrary number of samples (2, 4) should be adopted, and I would be reticent regarding the idea of implementing a custom mechanism to discard epochs. If we were too, I think it would require first documenting the limit of the MNE approach and why it is not better to fix it. I think this thing requires more thorough investigation. Particularly, we should build a (or a set of) unit test(s) that check for the limit cases. We can have a look at that on Friday if you'd like.

Sure, both james and tyler are using this are using this now I consider this to be high priority.

I also have hesitations about manually managing bad epochs but I also think we are in-between a standard (playing both sides of the fence), where we apparently take a strong stance against letting MNE store flagged channels and flagged ICs but only want MNE to handle storing/handling bad epochs.

I understand the reasons why we made those decisions but It makes things unclear for end-users IMO. via visualization they get to see the pipeline's decisions on epochs by default (but not channels or ICs).

β€” Reply to this email directly, view it on GitHub https://github.com/lina-usc/pylossless/pull/132#issuecomment-1589171352, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNIPGXOH5WIJJXFT4HX5RLXLBJQTANCNFSM6AAAAAAY6L6IOA . You are receiving this because you were mentioned.Message ID: @.***>

scott-huberty commented 1 year ago

Didn't we had a discussion where we considered removing the "bad" prefix to these annotations for this exact reason?

Yes - If I recall correctly, at that time it was you that had an incliniation to not prepend "_bad" and to handle bad epoch handling in house πŸ˜ƒ

christian-oreilly commented 1 year ago

I may remember it wrong, but I think to remember I was ok with the idea of removing bad and managing it with the rejection policy... It is not a tough call, I am always in favor of this policy thing.

On Tue, Jun 13, 2023, 08:12 Scott Huberty @.***> wrote:

Didn't we had a discussion where we considered removing the "bad" prefix to these annotations for this exact reason?

Yes - If I recall correctly, at that time it was you that had an incliniation to not prepend "_bad" and to handle bad epoch handling in house πŸ˜ƒ

β€” Reply to this email directly, view it on GitHub https://github.com/lina-usc/pylossless/pull/132#issuecomment-1589182752, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNIPGTVISVRLXU5DQBTFZ3XLBKLTANCNFSM6AAAAAAY6L6IOA . You are receiving this because you were mentioned.Message ID: @.***>

scott-huberty commented 1 year ago

I may remember it wrong, but I think to remember I was ok with the idea of

removing bad and managing it with the rejection policy... It is not a tough

call, I am always in favor of this policy thing.

On Tue, Jun 13, 2023, 08:12 Scott Huberty @.***> wrote:

@christian-oreilly , okay, then just to clarify, you are open to handle excluding epochs "in house"? (Removing 'bad_' would require that we only pass good epochs to pipeline steps).

If so, I can push a commit before Friday if I have time and we can review it, and make a decision on whether it's ultimately a good idea or not.

Didn't we had a discussion where we considered removing the "bad" prefix

to these annotations for this exact reason?

Yes - If I recall correctly, at that time it was you that had an

incliniation to not prepend "_bad" and to handle bad epoch handling in

house πŸ˜ƒ

β€”

Reply to this email directly, view it on GitHub

https://github.com/lina-usc/pylossless/pull/132#issuecomment-1589182752,

or unsubscribe

https://github.com/notifications/unsubscribe-auth/ABNIPGTVISVRLXU5DQBTFZ3XLBKLTANCNFSM6AAAAAAY6L6IOA

.

You are receiving this because you were mentioned.Message ID:

@.***>

@c

scott-huberty commented 1 year ago

Sorry for weirdly formatted message. I am on my phone.

christian-oreilly commented 1 year ago

Yeah, a made a few answers by phone too... Now that I am in my office, and for the sake of clarity, I may explain a bit more:

Does this seem reasonable?

scott-huberty commented 1 year ago
  • 'd make sure to always have a "pylossless" or "ll" or "pll_" prefix to distinguish these annotations from others.

agreed.

  • mostly like the rejection of "bads" annotations in MNE but that would apply this only on "pll_" annotations

hmmm but we also have to allow rejection on "bad_" in the case that the user added bad annotations BEFORE pylossless for epochs they don't want to have included.

Does this seem reasonable?

hmm im not sure. I was thinking of handling exclusion via epoch indices since we are already tracking epoch indices that are flagged and I think I've already figured it out. re-implementing the rejection by annotation boilerplate might be more far off for me (0.3).

In that case maybe we go with your original proposoal and take a closer look at the boundaries of the current bad_ annotations to see if we can find an immediate solution to the current problem.

christian-oreilly commented 1 year ago

hmmm but we also have to allow rejection on "bad_" in the case that the user added bad annotations BEFORE pylossless for epochs they don't want to have included.

One does not exclude the other. I see no problem in doing both. These are "orthogonal" issues.

hmm im not sure. I was thinking of handling exclusion via epoch indices since we are already tracking epoch indices that are flagged and I think I've already figured it out.

Well, if you want to do this, you can always use epoching stuff inside the pylossless processing, but since the output of the lossless approach is a continuous file (not an epoched one), these epochs needs to be transformed back to continuous time annotation at the end...

In that case maybe we go with your original proposoal and take a closer look at the boundaries of the current bad_ annotations to see if we can find an immediate solution to the current problem.

Sure. We can have a look at this on Friday.

scott-huberty commented 1 year ago

@christian-oreilly once CI's are green Let me know if we are okay to merge or feel free to merge yourself!

scott-huberty commented 1 year ago

Also @Andesha and @jadesjardins this is an important bug fix and may result in slight differences in pipeline output between prior runs and runs after this is shipped.

Andesha commented 1 year ago

thanks for the headsup :+1:

scott-huberty commented 1 year ago

NP.

@christian-oreilly let me know if you want more time to review this otherwise I will merge at the end of the week.

christian-oreilly commented 1 year ago

@scott-huberty Sorry for not giving timely feedback. It's been a hectic week. Nice codecov... at 77.69%, it is getting close to the targetted 90%.

scott-huberty commented 1 year ago

Indeed! I think we need to change the configuration of our tests to run on the main branch, so that our code cov is updated.