paulvangentcom / heartrate_analysis_python

Python Heart Rate Analysis Package, for both PPG and ECG signals
MIT License
968 stars 324 forks source link

Access to continous bpm? #7

Closed KrisITB closed 5 years ago

KrisITB commented 5 years ago

Hi!

First: congratulations! your ppg to hr conversion seems to be handling my data better than the application that came with the sensor %) and many thanks for the tutorial on your website!

Quick question: is there a way to access BPM values as they change over time (ideally averaged over a specified window)? From what I understand measures['bpm'] only returns the average, or am I missing something?

Am I correct to understand that I need to use indexes from working_data['peaklist_cor'] to calculate this myself? If so please consider this a feature request ;) I'm sure I don't have to stress how valuable such feature is in research practice and you already have an awesome tool here so that little feature would certainly make this the hottest stuff in town IMHO... at least for those of us cursed with working with ppg ;)

P.s.: any tips on how large the window should be?

Best wishes!

paulvangentcom commented 5 years ago

Thanks, and glad heartpy is of use to you! :)

I have some time to implement this later today, very good idea.

Cheers

paulvangentcom commented 5 years ago

Hi Kris,

I've implemented a few functions. Pull the latest version to get them. I need to reflect a bit on how to tie it together in an automated function call, properly so that it also makes sense from a data analysis perspective.

I've at least published the methods now so you can utilize them. For now you can use the functionality as such:

import heartpy as hp

data = hp.get_data('datafile.csv') #get your data

#windowsize in seconds, overlap in fraction (0.5 = 50%, needs to be 0 or greater and less than 1.0)
#slices now contains a list of index pairs corresponding to window segments. 
slices = hp.make_windows(data=data, sample_rate=100.0, windowsize=120, overlap=0.5)

continuous = {} #set dictionary for continuous measures

for slice in slices: #iterate over windows
    try:
        working_data, measures = hp.process(data[slice[0]:slice[1]], sample_rate=100.0)
        for k in measures.keys(): #go over all keys
            continuous = hp.append_dict(continuous, k, measures[k]) #append to dict
    except exceptions.BadSignalWarning: #if analysis goes wrong on a section, pass
        pass

print(continuous['bpm']) #access continuous bpm, all other measures are in there as well 

Now when sections of the signal, for whatever reason, create incorrect peak fitting results, outliers occur in the continuous measures. I've implemented two functions to identify and get rid of them, one based on the modified Z-score approach, and one based on the interquartile-range approach (H-spread). Use as such:


bpm = continuous['bpm'] #assuming the output dict from the last snippet

bpm_no_outliers = hp.outliers_modified_z(bpm) #for the Z-score approach
bpm_no_outliers = hp.outliers_iqr_method(bpm) #for the interquartile range approach

Both functions will replace the outlier with the signal median, which is robust against outliers.

I'll keep this issue open for now and will probably wrap it all up in a useful single function call later in the week.

Does this help? Stay tuned for the further implementation.

Warm regards, Paul

renatosc commented 5 years ago

Wow. Simply amazing. I just found this repo and I was also looking for a support to continuous data. Thanks Paul. I am looking forward to contribute to your repo also.

Are you familiar with the NeuroKit repo? It has some ECG peak detections (and more) implementations and adding yours there could be something nice to do. What do you think?

KrisITB commented 5 years ago

Hi Paul,

Many thanks! And I'm sorry for a delayed response but I was stuck in data collection so didn't had a chance to test it straight away.

I'm getting an error on the line: working_data, measures = hp.process(data[slice[0]:slice[1]], sample_rate=100.0)

Here's the full error:

C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\numpy\core_methods.py:135: RuntimeWarning: Degrees of freedom <= 0 for slice keepdims=keepdims) C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\numpy\core_methods.py:105: RuntimeWarning: invalid value encountered in true_divide arrmean, rcount, out=arrmean, casting='unsafe', subok=False) C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\numpy\core_methods.py:127: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)

KeyError Traceback (most recent call last)

in () 1 for slice in slices: #iterate over windows 2 print(all_data["PPG_A13"][slice[0]:slice[1]]) ----> 3 working_data, measures = hp.process(all_data["PPG_A13"][slice[0]:slice[1]], sample_rate=128.0) 4 ~\AppData\Roaming\Python\Python36\site-packages\heartpy-1.1.3-py3.6.egg\heartpy\heartpy.py in process(hrdata, sample_rate, windowsize, report_time, calc_freq, freq_method, interp_clipping, clipping_scale, interp_threshold, hampel_correct, bpmmin, bpmmax, reject_segmentwise, measures, working_data) 814 rol_mean = rolmean(hrdata, windowsize, sample_rate) 815 working_data = fit_peaks(hrdata, rol_mean, sample_rate, bpmmin=bpmmin, --> 816 bpmmax=bpmmax, working_data=working_data) 817 working_data = calc_rr(working_data['peaklist'], sample_rate, working_data=working_data) 818 working_data = check_peaks(working_data['RR_list'], working_data['peaklist'], working_data['ybeat'], ~\AppData\Roaming\Python\Python36\site-packages\heartpy-1.1.3-py3.6.egg\heartpy\heartpy.py in fit_peaks(hrdata, rol_mean, sample_rate, bpmmin, bpmmax, working_data) 523 for ma_perc in ma_perc_list: 524 working_data = detect_peaks(hrdata, rol_mean, ma_perc, sample_rate, --> 525 update_dict=True, working_data=working_data) 526 bpm = ((len(working_data['peaklist'])/(len(hrdata)/sample_rate))*60) 527 rrsd.append([working_data['rrsd'], bpm, ma_perc]) ~\AppData\Roaming\Python\Python36\site-packages\heartpy-1.1.3-py3.6.egg\heartpy\heartpy.py in detect_peaks(hrdata, rol_mean, ma_perc, sample_rate, update_dict, working_data) 485 if update_dict: 486 working_data['peaklist'] = peaklist --> 487 working_data['ybeat'] = [hrdata[x] for x in peaklist] 488 working_data['rolmean'] = rol_mean 489 working_data = calc_rr(working_data['peaklist'], sample_rate, ~\AppData\Roaming\Python\Python36\site-packages\heartpy-1.1.3-py3.6.egg\heartpy\heartpy.py in (.0) 485 if update_dict: 486 working_data['peaklist'] = peaklist --> 487 working_data['ybeat'] = [hrdata[x] for x in peaklist] 488 working_data['rolmean'] = rol_mean 489 working_data = calc_rr(working_data['peaklist'], sample_rate, C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\pandas\core\series.py in __getitem__(self, key) 599 key = com._apply_if_callable(key, self) 600 try: --> 601 result = self.index.get_value(self, key) 602 603 if not is_scalar(result): C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key) 2475 try: 2476 return self._engine.get_value(s, k, -> 2477 tz=getattr(series.dtype, 'tz', None)) 2478 except KeyError as e1: 2479 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']: pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item() **KeyError: 0** =================================== That's without a try catch that you have in there, the try catch results in a: **NameError: name 'exceptions' is not defined** This is all on python 3.6.2 with pandas 0.20.3 & jupyter notebook 0.27.0, conda 4.3.14 I suspect it might be an issue with my data or how I pull it: all_data["PPG_A13"][slice[0]:slice[1]] returns a that contains all the ppg values as expected but the hp.process method returns an error. I'll keep digging and try to fix it but any cues would be greatly appreciated :D p.s.: just a suggestion, I gave it some thought and maybe instead of going with a window average a delta time between peaks would be more accurate at the current data point? As in 100(if operating in ms)/deltaTime * 60 should produce BPM measure accurate for the last pair of peaks? Just a thought
paulvangentcom commented 5 years ago

Hi Kris,

Is it possible for you to share your data so I can have a look?

Two thoughts:

I'm updating the docs after I get home in an hour or two. I'll update you here as well.

paulvangentcom commented 5 years ago

One additional remark. If you're on V1.1.2 or V1.1.3 beware that there's an inaccurate method in there that decreases the peak detection accuracy in some situations. The fix is in 1.1.4 which I'll push once I get home in max two hours.

Any critical analysis should wait until then.

KrisITB commented 5 years ago

Is it possible for you to share your data so I can have a look?

Sure, this is one of the samples from the data set (ppg signal is in column V), once I'll manage to get that continuous HR working I want to append it as an additional column to that dataframe: merged_data.zip // had to zip it as github doesn't accept .csv files in posts

==================================================

Don't feed it pandas objects, the module expects a numpy array.

I had a feeling pandas series and numpy array are not the same thing ;) adding .values fixed the issue as you've suggested, sorry I'm very new to python, many thanks!

On to the new issues: If running make_windows() on the data attached with the default values that you have above (windowsize = 120 and sample rate =100) make_windows() return 11 pairs of int values but the last one is 72000 while my sample has 77953 records

I'm sampling at 128hz which returns 9 pairs and last value is 76800

If I understand correctly 120 window size averages HR from a 2 minute window? Below window size 4 (i.e.: 3, 2 and 1) pairs returned by make_windows() cover all the records in the sample but returns 'BadSignalWarning' (for windowsize 2 and 3)

and this: During handling of the above exception, another exception occurred:

NameError Traceback (most recent call last)

in () 13 for k in measures.keys(): #go over all keys 14 continuous = hp.append_dict(continuous, k, measures[k]) #append to dict ---> 15 except exceptions.BadSignalWarning: #if analysis goes wrong on a section, pass 16 pass 17 NameError: name 'exceptions' is not defined -------------- Also windowsize 1 produces an IndexError whatever the parameters passed to make_windows() I'm also getting these runtime warnings from numpy: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\numpy\core\_methods.py:135: RuntimeWarning: Degrees of freedom <= 0 for slice keepdims=keepdims) C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\numpy\core\_methods.py:105: RuntimeWarning: invalid value encountered in true_divide arrmean, rcount, out=arrmean, casting='unsafe', subok=False) C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\numpy\core\_methods.py:127: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) ================================== > delta time would be more accurate but nonsensical from a biosignal analysis perspective. Heart rate is way too variable in the short term for a single peak-peak interval to be useful. Get you, in my case I'm averaging over a fairly large number of samples so I believe the variability within-subjects should average out between samples and I'm trying to measure these tiny variations in response to intervention so I'm afraid that an average from a 2 minute window isn't going to be sensitive enough for my purposes, but I'll try addressing it with a window size of 10 or less > Any critical analysis should wait until then. Thanks for your help and heads up, I'll implement your update first thing tomorrow and will keep you posted ;) Huge thank you Paul and best wishes!
renatosc commented 5 years ago

I ran your csv file and it worked fine. Here is the code:

import heartpy as hp
from heartpy import exceptions
import pandas as pd

df = pd.read_csv("merged_data.csv")
data = df['PPG[mV]'].values

#windowsize in seconds, overlap in fraction (0.5 = 50%, needs to be 0 or greater and less than 1.0)
#slices now contains a list of index pairs corresponding to window segments.
slices = hp.make_windows(data=data, sample_rate=128.0, windowsize=10, overlap=0.5)
continuous = {} #set dictionary for continuous measures

for slice in slices: #iterate over windows
    try:        
        working_data, measures = hp.process(data[slice[0]:slice[1]], sample_rate=100.0)
        for k in measures.keys(): #go over all keys
            continuous = hp.append_dict(continuous, k, measures[k]) #append to dict
            print(continuous['bpm']) #access continuous bpm, all other measures are in there as well
    except exceptions.BadSignalWarning: #if analysis goes wrong on a section, pass
        pass

print("last BPM=", continuous['bpm'][-1]) #printing last BPM
paulvangentcom commented 5 years ago

@KrisITB I've implemented and pushed the wrapper to the codebase. Please pull and install (through setup.py) the latest version: 1.1.4 as of this writing. The fix (or rather: revert) for the detection accuracy is also there

Usage should be as simple as possible for your use case now:

import heartpy as hp

segmented_working_data, segmented_measures = hp.process_segmentwise(data, sample_rate=100.0, segment_width=120, segment_overlap=0)

Extra arguments are possible, and you can pass the same arguments as to the process() function (need to be the last ones you pass)

See the updated docs on the method here on what the function does and how you can use it: https://python-heart-rate-analysis-toolkit.readthedocs.io/en/latest/quickstart.html#getting-heart-rate-over-time

Now on to your questions:

sorry I'm very new to python, many thanks!

No worries! You're asking questions and that is commendable. I wish I'd done that more when I started out with Python (or programming in general, actually).

If running make_windows() on the data attached with the default values that you have above (windowsize = 120 and sample rate =100) make_windows() return 11 pairs of int values but the last one is 72000 while my sample has 77953 records

This is correct. For now the method computes only on full windows and discards the final (partial) window. I'll flag this as a new issue and will add something and push it tomorrow.

If I understand correctly 120 window size averages HR from a 2 minute window? Below window size 4 (i.e.: 3, 2 and 1) pairs returned by make_windows() cover all the records in the sample but returns 'BadSignalWarning' (for windowsize 2 and 3)

Correct, the argument is set in seconds. If you go too low there will be too few peaks (or no peaks!) and the analysis goes off the rails. BadSignalWarning is an exception that indicates no peaks or too few peaks are detected and/or present in the signal. The promised fix to take the last partial window into account as well will resolve this.

in my case I'm averaging over a fairly large number of samples so I believe the variability within-subjects should average out between samples and I'm trying to measure these tiny variations in response to intervention so I'm afraid that an average from a 2 minute window isn't going to be sensitive enough for my purposes, but I'll try addressing it with a window size of 10 or less

There will probably be too much variance related to basic physiological processes when analysed this way (you're not interested in those, at this time interval it's mostly noise!). The first thing I would try is using 30-60 second windows and using the heart rate variability measures (start with RMSSD). The HRV measures express the variance over time in the signal in a much more robust way than the BPM (especially when computed on few beats) will, and are sensitive to small deviations in parts of the signal. This is what you're looking for.

If you want to increase the resolution on the computed measures, set the segment_overlap argument, for example to segment_overlap = 0.5 for a 50% overlap between adjacent windows.

paulvangentcom commented 5 years ago

@KrisITB I've implemented a way to include the tail end of the data as you requested. See Issue #11 for info.

KrisITB commented 5 years ago

Hey Paul!

Many thanks I'll test it out properly tmr

p.s.: there is a missing " on line 8 in setup.py in 1.1.5, this is probably the shortest commit ever but it's my first one in a real OS project ;)

had a quick check and the fast mode returns a single bpm value but slow mode works great, although it would be great to also get an easy access from that dictionary to indexes

paulvangentcom commented 5 years ago

this is probably the shortest commit ever but it's my first one in a real OS project ;)

Welcome to the wondrous world of OS :). Thanks for signalling and fixing that.

Cheers Paul

KrisITB commented 5 years ago

Thanks! :)

to quickly elaborate on that 'return indexes' request: what I do at the moment to assign indexes to these continuous values is 1) run make_windows() with the same parameters as used in process_segmentwise() and assign returned array to a variable, from where I pull the period length from position [1,0], 2) run a for loop through length of my PPG data calculating indexes to be used to pull data from dictionary returned from process_segmentwise() and assign value from this position to a dataframe at the index corresponding to the data from the PPG dataset:

for i in range(0, len(data)):
    ind = int(i/period)

    if(ind>=lastSegmentInd):
        ind = lastSegmentInd-1

    hp_data["bpm"].set_value(i, segmented_measures["bpm"][ind])
    etc. 

it works for me but I think it will help others who seek 'plug and play' functionality if there would be some return for indexes to map continuous values to the indexes in the original dataset I have a bit of 'spare' data at the end of each recording so I don't mind a larger bin at the last index hence the period set to [0,1] position, but it would be nice to have some control over this through a parameter

-------------------------------------------------------------- also: when used in full mode (as mentioned fast one wasn't returning continuous measures) nn20 and nn50 returns lists of values e.g,: nn20: [[46.875, 46.875, 507.8125, 507.8125, 281.25], [507.8125, 507.8125, 531.25], [], [210.9375], [218.75, 23.4375, 406.25, 23.4375, 210.9375, 93.75], [23.4375, 210.9375, 93.75, 70.3125, 31.25], [23.4375, 210.9375, 93.75, 70.3125, 31.25, 31.25, 46.875, 117.1875], [218.75, 23.4375, 406.25, 23.4375, 210.9375, 93.75, 70.3125, 31.25, 31.25, 46.875, 70.3125], [23.4375, 210.9375, 93.75, 70.3125, 31.25, 31.25, 46.875, 70.3125],

nn50:

[507.8125, 507.8125, 281.25], [507.8125, 507.8125, 531.25], [], [210.9375], [218.75, 406.25, 210.9375, 93.75], [210.9375, 93.75, 70.3125], [210.9375, 93.75, 70.3125, 117.1875], [218.75, 406.25, 210.9375, 93.75, 70.3125, 70.3125], [210.9375, 93.75, 70.3125, 70.3125], [210.9375, 93.75, 70.3125, 70.3125, 62.5, 62.5], [210.9375, 93.75, 70.3125, 70.3125, 62.5, 62.5], [54.6875, 210.9375, 93.75, 70.3125, 70.3125, 62.5, 62.5, 85.9375], [210.9375, 93.75, 70.3125, 70.3125, 62.5, 62.5, 85.9375], [70.3125, 70.3125, 62.5, 62.5, 85.9375], [70.3125, 62.5, 62.5, 85.9375], [70.3125, 62.5, 62.5, 85.9375], [54.6875, 62.5, 62.5, 85.9375], [62.5, 85.9375, 62.5], [62.5, 85.9375, 62.5, 117.1875], [85.9375, 62.5, 117.1875], [78.125, 62.5, 117.1875],

I'd love to fix that for you but you have quite a lot going on there and my brain said we are not doing this ;)

paulvangentcom commented 5 years ago

it works for me but I think it will help others who seek 'plug and play' functionality if there would be some return for indexes to map continuous values to the indexes in the original dataset

Excellent idea. Il'l get on that asap!

I have a bit of 'spare' data at the end of each recording so I don't mind a larger bin at the last index hence the period set to [0,1] position, but it would be nice to have some control over this through a parameter

Also a good idea. If I understand correctly you would like control to either:

let me know if I misunderstand somehow. It's the end of a long week for me :).

when used in full mode (as mentioned fast one wasn't returning continuous measures)

Yeah I picked that issue with 'fast' up this afternoon and pushed out a hotfix. You can pull the latest from the repo.

nn20 and nn50 returns lists of values e.g,:

I see these are appended to 'measures'. That is incorrect as they are temp containers. I'll put them into working_data{} on the next update. You can skip them.

KrisITB commented 5 years ago

Hi Paul,

Many thanks!

lump the tail end of the data to the last 'full' bin, thereby making it longer than the others

Yes, that's roughly what I would need here, I'll elaborate a bit more on it in the #16

FYI: since 1.6 I'm getting an error: AttributeError: module 'HeartPy' has no attribute 'make_windows' when I try to access it to calculate periods, I tried it on two separate machines with same results so I doubt it's on my end, have you changed the scope of this function or something? I'm looking at it in hearpy,py and it looks fine but I can't get access to it through the module (works fine within the module). I'm working with a local copy of this function to work around it and it's fine but I though you might want to know about that small access issue ;)

paulvangentcom commented 5 years ago

FYI: since 1.6 I'm getting an error: AttributeError: module 'HeartPy' has no attribute 'make_windows' when I try to access it to calculate periods, I tried it on two separate machines with same results so I doubt it's on my end, have you changed the scope of this function or something?

Yes I'm in the process of reformatting the API and separating the internal and external methods to comply with pep-8 standards. I expected you were already using the process_segmentwise() method to work on your segments. This will be the intended usage from now on.

Let me know if that doesn't work for you and I'll put make_windows() back into the package namespace with the next push.

paulvangentcom commented 5 years ago

Yes, that's roughly what I would need here, I'll elaborate a bit more on it in the #16

Ok I was waiting on confirmation. Great! I'll work on closing the two open issues related to this one tonight

paulvangentcom commented 5 years ago

Alright I've pushed the update and closed #14 and #16.

To make the last segment a smaller one, usage is like before: set the 'min size' to whatever you want. To lump the tail end of the data into the last bin, set 'min size' to -1.


wd, m = hp.process_segmentwise(data, sample_rate = 100.0, segment_width = 120,  segment_overlap = 0, segment_min_size = -1)

I've made make_windows() available for use again.

paulvangentcom commented 5 years ago

I'm closing this due to inactivity, assuming the issues have been resolved.