The ultimate method for R-peak detection?

DominiqueMakowski commented 4 years ago

There are many many R-peaks detection methods and algorithms with no clear guidelines on which one to use / which one is the best etc. It is likely that they all have some strengths and weaknesses.

For some reason, I woke with this idea in mind: what if we found a way of combining all these methods of R-peak detection. The tricky part is that methods return peak indices, which combination into a probabilistic statement is not straightforward.

So I thought, why not consider each peak detection by a method as the peak of a (probability) distribution, like a normal curve which width would be approximately covering a QRS segment. If we convolve each peak with this normal distribution, we then have some sort of continuous pseudo-probabilistic signal. We can then combine the results from all the methods by summing them.

Once we have the combination of these convolved peaks, we can select its peaks as the most probable peaks. I added a first draft here. Here's how it looks like:

import neurokit2 as nk

ecg = nk.ecg_simulate(duration=10, sampling_rate=500)
ecg = nk.signal_distort(ecg,
                                     sampling_rate=500,
                                     noise_amplitude=0.2, noise_frequency=[25, 50],
                                     artifacts_amplitude=0.2, artifacts_frequency=50)
nk.ecg_findpeaks(ecg, sampling_rate=1000, method="promac", show=True)

Figure_1

Does that make sense to you? could it be improved? Or is it a bad idea?

@JanCBrammer @TiagoTostas

JanCBrammer commented 4 years ago

Interesting. This is conceptually similar to ensemble learning in predictive modelling.

It would be interesting to see if an ensemble detector results in significantly better performance than the individual detectors (on a benchmark dataset that is a diverse and large as possible).

Benchmarking all the detectors (with their default parameters) would be interesting anyway. I think we had a similar idea (#78) a while ago.

DominiqueMakowski commented 4 years ago

Benchmarking all the detectors (with their default parameters) would be interesting anyway. I think we had a similar idea (#78) a while ago.

Yeah, this could be a super useful little study. The hard part is findings/creating the benchmark dataset, but with the simulators and the signal_distort I'm sure we could do something great. In fact, I wonder if we could first create and publish an open-source dataset for benchmarking which would contain original signals (clean), the true peaks locations, and then for each signal different levels of degradation. So people could try to retrieve from the degraded versions the true signal/peak location. That'd be a nice contribution to the field

TiagoTostas commented 4 years ago

Wow 😃 Conceptually, I believe that's a good idea, let me just point out some concerns I have with that approach:

Some of the methods implemented in neurokit for R-peak detection, e.g. Pan-Tompkins, Christov and Hamilton (based on integration/thresholding) are not really R-peak detectors. Instead, they are QRS based detectors and lead to poorer temporal resolution of the R-peak position as @JanCBrammer pointed out in https://www.biorxiv.org/content/10.1101/722397v3.full. Thus, I believe adding these methods would lead to poorer temporal resolution without much added benefit, since they are based on similar derivative/sum/integration steps (of course this would need to be verified). Nevertheless, combining methods based on different approaches could be interesting, e.g. Engzee + Kalidas + FSM (or different combinations). -> Benchmark different set of combinations
Just saw the discussion in (#78). I felt the same when trying to benchmark the detector against commons databases. The results described in most papers don't report the 'tolerance', which peaks (only normal, ectopic...) were used... @DominiqueMakowski why try to simulate a new database and don't rely on third-party ones? There are several annotated ones in different conditions (high Quality, poor quality, noise stress, telehealth, e.g. a recent study that compares different detectors in different database https://www.researchgate.net/publication/325030653_Performance_Analysis_of_Ten_Common_QRS_Detectors_on_Different_ECG_Application_Cases ) The main lack of consensus is the absence of a well defined pipeline from database - paratameters (tolerance, beats to consider) - detection. Not the databases themselves, am I missing something?

DominiqueMakowski commented 4 years ago

@TiagoTostas as my understanding of the specificities of the different methods is limited, could you help maybe remove the methods that are in your opinion useless (that would do more harm than good) from this meta-method:

https://github.com/neuropsychology/NeuroKit/blob/7a96f37d9010dc28c2c422cd20a8e8c2620eba1f/neurokit2/ecg/ecg_findpeaks.py#L145-L154

As for the preprocessing study, you might be right, I assumed that it would have been simpler to generate it ourselves as we could really control all of the parameters (sampling rate, heart rate, and distortion) to have a virtually unlimited set, but we might indeed be reinventing the wheel here (moreover it's true that one of the main limitations would be the absence of biological artefacts like ectopic beats etc., as our simulator only generates a "healthy" signal).

If the databases on which we could test different algorithms are accessible in OA, maybe it would then be useful to facilitate their usage as a testing framework. For instance, we could either directly store/sanitize/formats/combine them in some repo, or create a function that downloads them. Then, we could have a function to which we pass a preprocessing function (ecg_test_preprocessing(mypipeline)), that runs it on the databases and returns some quality metrics. With that, it would super easy to then generate preprocessing functions with different parameters, and compare them. Does something like this exist already or does it seem like an overkill in regards with the current state of things?

TiagoTostas commented 4 years ago

(-) The methods based on moving averages and thresholding are usually not very specific (PanTompkins, Christov and Hamilton). I verified this when I was doing some tests previously, but it is better explained in the reference above if you are curious. (+) Engzee is very specific since it implements a local search for the maximum around the detected peak. Kalidas is stated to have good specificity in that report (visually it has good results, but i have never testes it). The method that I proposed is an adaptation of https://www.researchgate.net/publication/281488007_Novel_Real-Time_Low-Complexity_QRS_Complex_Detector_Based_on_Adaptive_Thresholding?_iepl%5BgeneralViewId%5D=Qb0AGUSmDMjvtIvWycggbg2YDEdgeVoVd7I4&_iepl%5Bcontexts%5D%5B0%5D=searchReact&_iepl%5BviewId%5D=ff3MQmzsqdtigz1J7gqoeVXBQEoV0QqzSSrl&_iepl%5BsearchType%5D=publication&_iepl%5Bdata%5D%5BcountLessEqual20%5D=1&_iepl%5Bdata%5D%5BinteractedWithPosition1%5D=1&_iepl%5Bdata%5D%5BwithEnrichment%5D=1&_iepl%5Bposition%5D=1&_iepl%5BrgKey%5D=PB%3A281488007&_iepl%5BtargetEntityId%5D=PB%3A281488007&_iepl%5BinteractionType%5D=publicationTitle, with the adaptation of the preprocessing stage to increase its specificity (some bias here ahah).

I did some tests with ssf and gamboa before, but these were not very robust. In an ultimate scenario they can be added, but I think they don't add much. I am not familiar with the neurokit and WT ones.

but we might indeed be reinventing the wheel here (moreover it's true that one of the main limitations would be the absence of biological artefacts like ectopic beats etc., as our simulator only generates a "healthy" signal).

Yeah, I think that's the case :( There was an attempt to do something similar here, with MIT-BIH and GUDB databases: https://github.com/berndporr/py-ecg-detectors

But usually, this is done by each researcher (I believe) and that's why it leads to these incoerences. The physionet is well documented in terms of importing the annothations to python and we can also get some inspiration from the link, so I think that adding this validation block to the pipeline would be a great/useful addition

JanCBrammer commented 4 years ago

If the databases on which we could test different algorithms are accessible in OA, maybe it would then be useful to facilitate their usage as a testing framework. For instance, we could either directly store/sanitize/formats/combine them in some repo, or create a function that downloads them.

@DominiqueMakowski, this is essentially what the wfdb packages do with the PhysioNet databases. However, the problem is that some of these are poorly annotated as pointed out by @berndporr in this paper. I believe this is what led his team to create the Glasgow University Database (GUDB), which is quite nice (I use it to benchmark the biopeaks/NeuroKit R-peak detector).

I feel like we should start by benchmarking our own detectors to provide users with some clear indication of (relative) performance. Making general statements about detector performances is quite tricky as pointed out by @TiagoTostas, since performance is heavily influences by ideosyncratic pre-processing steps, implementation details of the detector, and importantly also the evaluation criteria (e.g., tolerance for match between manually annotated peaks and peaks identified by the detectors). We could use the GUDB for that and further distort the ECG (although the database already includes a "runnning" and "handbike" condition).

JanCBrammer commented 4 years ago

I've look into setting up a pipeline for benchmarking our detectors. The problem with the GUDB is that the data need to be requested and downloaded manually. This means that we'd have to host the database ourselves somewhere. In contrast, with PhysioNet the data can be fetched from their servers using the wfdb API. For now, we can set up a benchmark using wfdb and PhysioNet and then see how we can improve this,

berndporr commented 4 years ago

I have the database on a GIT at the university here as well but it's not public. The reason the data needs to be requested is because central IT haven't got enough space as far as I know but I see the appeal to have the dataset available via an API. I'm setting up an http server at the moment anyway for another project and then point the API to it. Need to talk to IT regarding this but see the appeal of course. We have exam season and can look into this next week.

DominiqueMakowski commented 4 years ago

Hi @berndporr, thanks for chiming in ☺️ looking forward to hearing more!

JanCBrammer commented 4 years ago

The reason the data needs to be requested is because central IT haven't got enough space

That makes sense. Would it be an option to host the dataset without the MP4 files (or host them separately)? I belive those create most of the bulk.

Need to talk to IT regarding this but see the appeal of course. We have exam season and can look into this next week.

That would be amazing!

berndporr commented 4 years ago

Hi all, https://github.com/berndporr/ECG-GUDB so here is the API to access the ECG data. Should install with

pip3 install ecg_gudb_database

and then have a play with the usage example. The datasets are simply on this github as gh-pages. The API then does an http request and parses them on the fly.

Datchthana1 commented 1 month ago

I want to know how each method is different and which one should I choose to find the ecg_peak value?

neuropsychology / NeuroKit

The ultimate method for R-peak detection? #222