Add preprocessing report feature for quality control

DominiqueMakowski commented 2 years ago

I thought there was already an issue for this but couldnt find it.

That's a rather long-term goal. The idea is to add an option; report=True or report="myreport.pdf", to processing routines. This would be useful for quality control to check how the preprocessing performed on each individual or sample.

These would contain:

Figures with: raw signal, filtered signal, peaks detection etc (for visual assessment)
- We need to find a way to deal with super long signals, do we display it all? do we display a bit to keep the scale reasonable?
A table containing the mean, SD of for instance heart rate, average signal quality etc.
- This table could also be (only) printed in the console when report=True
Specific references of the methods used in the preprocessing (changing depending on the pipeline used)
Specific text like "The raw ECG signal (sampled at 1000 Hw) was filtered with a bandpass filter ([0.1, 2], butterworth 5th order). The peak detection was carried using authors (2010) method."

danibene commented 2 years ago

We need to find a way to deal with super long signals, do we display it all? do we display a bit to keep the scale reasonable?

@DominiqueMakowski if you're fine with HTML rather than PDF output you could embed an interactive plot? E.g. with plotly? Alternatively you could show a static plot with the first N seconds (e.g. 0 - 30) and then have a plot for each window of N seconds afterwards (30 - 60, 60 - 90...), maybe in an appendix

DominiqueMakowski commented 2 years ago

Yes I think you're right, mne-style interactive html reports would probably be the most flexible option.

I see this as a big feature (that could be the marker of the next big release, like 3.0), but it feels way beyond my scope ^^ We'd need to start with something small that has only a handful of methods / figures, and then once we're satisfied implement it for the rest of signals

danibene commented 2 years ago

@DominiqueMakowski

We'd need to start with something small that has only a handful of methods / figures

How about we start with ppg_process(), with raw/cleaned/peaks + heart rate plots (essentially this but interactive https://neuropsychology.github.io/NeuroKit/_images/p_ppg_process1.png), and then try to expand to ecg_process()?

(I'm biased but) I think even just a tool to check the peak detection for heartbeat signals would be useful

DominiqueMakowski commented 2 years ago

sounds good!

danibene commented 2 years ago

Is this along the lines of what you had in mind? https://danibene.github.io/neurokit2-ppg-report-example/myreport.html

DominiqueMakowski commented 2 years ago

Yeah, that looks awesome! perhaps we should create functions called ppg_report(method_cleaning="something", method_peak="something") in which we create the corresponding text programmatically (if method_cleaning=="this": "Data cleaning was performed using ..."), so that we can basically call this function and put its output inside the HTML

(this way, users can also get this info in the console / in a re-useable programmatical form)

danibene commented 2 years ago

this way, users can also get this info in the console / in a re-useable programmatical form

That would be good!

It's currently only possible to use the elgendi method with ppg_process() even though nabian2018 is also available in ppg_clean() - do you think we should add new arguments to ppg_process() to allow customization e.g. ppg_findpeaks_kwargs and ppg_clean_kwargs? These keyword arguments could then be stored in the info dict returned by ppg_process(), which could be passed to ppg_report()?

DominiqueMakowski commented 2 years ago

I think we can extend ppg_process() indeed to allow more customization.

Maybe we could like that:

1) add implement *_report():

function ppg_report(method="elgendi", method_cleaning="default", method_peaks="default"):
    # This function first sanitizes the input, i.e., if the specific methods are "default" then it adjusts based on the "general" default
    # And then it creates the pieces of text for each element 
    return {"method_cleaning": method_cleaning, "method_peaks": method_peaks, "text_cleaning": text_cleaning, "text_peaks": text_peaks}

We allow kwargs in `ppg_report(ppg, method, kwargs)` and we call the report() function directly after. This allows us to have the sanitized inputs for specific methods for the cleaning peaks etc.

In other words, we introduce a _report() first step that returns all the submethods that we use to specific steps below, and we allow _process' kwargs to be passed to it for maximum flexibility

do you see what i mean?

danibene commented 2 years ago

I'm not sure how **kwargs could be used in case you want to customize both ppg_clean() and ppg_findpeaks() (without some hack: e.g. using inspect https://stackoverflow.com/a/13131673)

How about using 2 separate dictionaries for the kwargs, and sanitizing these at the beginning of ppg_process()?

https://github.com/neuropsychology/NeuroKit/blob/beb7afdf6d6b538e4e0a1f2e986f6ddc7c8789cf/neurokit2/ppg/ppg_process.py#L12

https://github.com/neuropsychology/NeuroKit/blob/beb7afdf6d6b538e4e0a1f2e986f6ddc7c8789cf/neurokit2/ppg/ppg_process.py#L57-L65

DominiqueMakowski commented 2 years ago

no need for hacks, I thought of something like that:

def ppg_report(method="elgendi", method_cleaning="default", method_peaks="default"):
    if method_cleaning == "default":
        if method == "elgendi":
             method_cleaning = "elgendi"
        elif method == "somethingelse":
             ....
    if method_peaks == "default":
        if method == "elgendi":
             method_peaks = "elgendi"
        elif method == "somethingelse":
             ....
    return {"method_cleaning": method_cleaning, "method_peaks": method_peaks, "text_cleaning": text_cleaning, "text_peaks": text_peaks}

def ppg_process(signal, method="elgendi", **kwargs):
    # Sanitize input
    report_info = ppg_report(method=method, **kwargs)

    # Use it 
    ppg_clean(signal, method=report_info["method_cleaning"])
    ppg_peaks(signal, method=report_info["method_peaks"])

here we capture the kwargs by passing then to ppg_info, which then outputs a dict with all the submethods specified that we pass to the rest

stale[bot] commented 2 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

danibene commented 1 year ago

Hi @DominiqueMakowski , I'd be interested in your thoughts on what to prioritize here. Right now this feature is only implemented for some of the signals and encompasses several smaller features:

Plot for quality control
Customization of methods/keyword arguments for those methods
Plain-text explanation of the various methods and keyword arguments with associated references

I find that #3 takes a while, especially if the keyword arguments depend on the method (see e.g. EMG https://github.com/danibene/NeuroKit/blob/feature/report_methods_emg/neurokit2/emg/emg_methods.py) and I'm wondering if we should prioritize having a consistent API for the various signals for #1 and #2 rather than aspiring to fully document the methods in #3? And if so, should we remove the descriptions written so far or should we leave them as a work in progress?

DominiqueMakowski commented 1 year ago

I agree with you and prioritze 1 and 2. That said, I would rather not remove the descriptions for now (assuming it's maintainable): I think this feature can be a super strong selling point to showcase and demo (with the disclaimer that it's work in progress and incomplete). But it's one of these thing that has a "wow" potential and gives neurokit a strong edge.

So let's keep it as-is where it's already there, and prioritize the rest. Once I have some time (I pray for that to happen someday🙏) I can give 3) some love :)

neuropsychology / NeuroKit

Add preprocessing report feature for quality control #665