Closed KrisITB closed 5 years ago
Thanks, and glad heartpy is of use to you! :)
I have some time to implement this later today, very good idea.
Cheers
Hi Kris,
I've implemented a few functions. Pull the latest version to get them. I need to reflect a bit on how to tie it together in an automated function call, properly so that it also makes sense from a data analysis perspective.
I've at least published the methods now so you can utilize them. For now you can use the functionality as such:
import heartpy as hp
data = hp.get_data('datafile.csv') #get your data
#windowsize in seconds, overlap in fraction (0.5 = 50%, needs to be 0 or greater and less than 1.0)
#slices now contains a list of index pairs corresponding to window segments.
slices = hp.make_windows(data=data, sample_rate=100.0, windowsize=120, overlap=0.5)
continuous = {} #set dictionary for continuous measures
for slice in slices: #iterate over windows
try:
working_data, measures = hp.process(data[slice[0]:slice[1]], sample_rate=100.0)
for k in measures.keys(): #go over all keys
continuous = hp.append_dict(continuous, k, measures[k]) #append to dict
except exceptions.BadSignalWarning: #if analysis goes wrong on a section, pass
pass
print(continuous['bpm']) #access continuous bpm, all other measures are in there as well
Now when sections of the signal, for whatever reason, create incorrect peak fitting results, outliers occur in the continuous measures. I've implemented two functions to identify and get rid of them, one based on the modified Z-score approach, and one based on the interquartile-range approach (H-spread). Use as such:
bpm = continuous['bpm'] #assuming the output dict from the last snippet
bpm_no_outliers = hp.outliers_modified_z(bpm) #for the Z-score approach
bpm_no_outliers = hp.outliers_iqr_method(bpm) #for the interquartile range approach
Both functions will replace the outlier with the signal median, which is robust against outliers.
I'll keep this issue open for now and will probably wrap it all up in a useful single function call later in the week.
Does this help? Stay tuned for the further implementation.
Warm regards, Paul
Wow. Simply amazing. I just found this repo and I was also looking for a support to continuous data. Thanks Paul. I am looking forward to contribute to your repo also.
Are you familiar with the NeuroKit repo? It has some ECG peak detections (and more) implementations and adding yours there could be something nice to do. What do you think?
Hi Paul,
Many thanks! And I'm sorry for a delayed response but I was stuck in data collection so didn't had a chance to test it straight away.
I'm getting an error on the line: working_data, measures = hp.process(data[slice[0]:slice[1]], sample_rate=100.0)
Here's the full error:
KeyError Traceback (most recent call last)
Hi Kris,
Is it possible for you to share your data so I can have a look?
Two thoughts:
Don't feed it pandas objects, the module expects a numpy array. You can get a numpy array from pandas by adding .values to the end of your slice (using your example): all_data["PPG_A13"][slice[0]:slice[1]].values
delta time would be more accurate but nonsensical from a biosignal analysis perspective. Heart rate is way too variable in the short term for a single peak-peak interval to be useful. Usually at least 10 peaks are taken, but I think even this is short. You can approach this right now by using a short windowsize.
I'm updating the docs after I get home in an hour or two. I'll update you here as well.
One additional remark. If you're on V1.1.2 or V1.1.3 beware that there's an inaccurate method in there that decreases the peak detection accuracy in some situations. The fix is in 1.1.4 which I'll push once I get home in max two hours.
Any critical analysis should wait until then.
Is it possible for you to share your data so I can have a look?
Sure, this is one of the samples from the data set (ppg signal is in column V), once I'll manage to get that continuous HR working I want to append it as an additional column to that dataframe: merged_data.zip // had to zip it as github doesn't accept .csv files in posts
==================================================
Don't feed it pandas objects, the module expects a numpy array.
I had a feeling pandas series and numpy array are not the same thing ;) adding .values fixed the issue as you've suggested, sorry I'm very new to python, many thanks!
On to the new issues: If running make_windows() on the data attached with the default values that you have above (windowsize = 120 and sample rate =100) make_windows() return 11 pairs of int values but the last one is 72000 while my sample has 77953 records
I'm sampling at 128hz which returns 9 pairs and last value is 76800
If I understand correctly 120 window size averages HR from a 2 minute window? Below window size 4 (i.e.: 3, 2 and 1) pairs returned by make_windows() cover all the records in the sample but returns 'BadSignalWarning' (for windowsize 2 and 3)
and this: During handling of the above exception, another exception occurred:
NameError Traceback (most recent call last)
I ran your csv file and it worked fine. Here is the code:
import heartpy as hp
from heartpy import exceptions
import pandas as pd
df = pd.read_csv("merged_data.csv")
data = df['PPG[mV]'].values
#windowsize in seconds, overlap in fraction (0.5 = 50%, needs to be 0 or greater and less than 1.0)
#slices now contains a list of index pairs corresponding to window segments.
slices = hp.make_windows(data=data, sample_rate=128.0, windowsize=10, overlap=0.5)
continuous = {} #set dictionary for continuous measures
for slice in slices: #iterate over windows
try:
working_data, measures = hp.process(data[slice[0]:slice[1]], sample_rate=100.0)
for k in measures.keys(): #go over all keys
continuous = hp.append_dict(continuous, k, measures[k]) #append to dict
print(continuous['bpm']) #access continuous bpm, all other measures are in there as well
except exceptions.BadSignalWarning: #if analysis goes wrong on a section, pass
pass
print("last BPM=", continuous['bpm'][-1]) #printing last BPM
@KrisITB I've implemented and pushed the wrapper to the codebase. Please pull and install (through setup.py) the latest version: 1.1.4 as of this writing. The fix (or rather: revert) for the detection accuracy is also there
Usage should be as simple as possible for your use case now:
import heartpy as hp
segmented_working_data, segmented_measures = hp.process_segmentwise(data, sample_rate=100.0, segment_width=120, segment_overlap=0)
Extra arguments are possible, and you can pass the same arguments as to the process()
function (need to be the last ones you pass)
See the updated docs on the method here on what the function does and how you can use it: https://python-heart-rate-analysis-toolkit.readthedocs.io/en/latest/quickstart.html#getting-heart-rate-over-time
sorry I'm very new to python, many thanks!
No worries! You're asking questions and that is commendable. I wish I'd done that more when I started out with Python (or programming in general, actually).
If running make_windows() on the data attached with the default values that you have above (windowsize = 120 and sample rate =100) make_windows() return 11 pairs of int values but the last one is 72000 while my sample has 77953 records
This is correct. For now the method computes only on full windows and discards the final (partial) window. I'll flag this as a new issue and will add something and push it tomorrow.
If I understand correctly 120 window size averages HR from a 2 minute window? Below window size 4 (i.e.: 3, 2 and 1) pairs returned by make_windows() cover all the records in the sample but returns 'BadSignalWarning' (for windowsize 2 and 3)
Correct, the argument is set in seconds. If you go too low there will be too few peaks (or no peaks!) and the analysis goes off the rails. BadSignalWarning is an exception that indicates no peaks or too few peaks are detected and/or present in the signal. The promised fix to take the last partial window into account as well will resolve this.
in my case I'm averaging over a fairly large number of samples so I believe the variability within-subjects should average out between samples and I'm trying to measure these tiny variations in response to intervention so I'm afraid that an average from a 2 minute window isn't going to be sensitive enough for my purposes, but I'll try addressing it with a window size of 10 or less
There will probably be too much variance related to basic physiological processes when analysed this way (you're not interested in those, at this time interval it's mostly noise!). The first thing I would try is using 30-60 second windows and using the heart rate variability measures (start with RMSSD). The HRV measures express the variance over time in the signal in a much more robust way than the BPM (especially when computed on few beats) will, and are sensitive to small deviations in parts of the signal. This is what you're looking for.
If you want to increase the resolution on the computed measures, set the segment_overlap
argument, for example to segment_overlap = 0.5
for a 50% overlap between adjacent windows.
@KrisITB I've implemented a way to include the tail end of the data as you requested. See Issue #11 for info.
Hey Paul!
Many thanks I'll test it out properly tmr
p.s.: there is a missing " on line 8 in setup.py in 1.1.5, this is probably the shortest commit ever but it's my first one in a real OS project ;)
had a quick check and the fast mode returns a single bpm value but slow mode works great, although it would be great to also get an easy access from that dictionary to indexes
this is probably the shortest commit ever but it's my first one in a real OS project ;)
Welcome to the wondrous world of OS :). Thanks for signalling and fixing that.
Cheers Paul
Thanks! :)
to quickly elaborate on that 'return indexes' request: what I do at the moment to assign indexes to these continuous values is 1) run make_windows() with the same parameters as used in process_segmentwise() and assign returned array to a variable, from where I pull the period length from position [1,0], 2) run a for loop through length of my PPG data calculating indexes to be used to pull data from dictionary returned from process_segmentwise() and assign value from this position to a dataframe at the index corresponding to the data from the PPG dataset:
for i in range(0, len(data)):
ind = int(i/period)
if(ind>=lastSegmentInd):
ind = lastSegmentInd-1
hp_data["bpm"].set_value(i, segmented_measures["bpm"][ind])
etc.
it works for me but I think it will help others who seek 'plug and play' functionality if there would be some return for indexes to map continuous values to the indexes in the original dataset I have a bit of 'spare' data at the end of each recording so I don't mind a larger bin at the last index hence the period set to [0,1] position, but it would be nice to have some control over this through a parameter
-------------------------------------------------------------- also: when used in full mode (as mentioned fast one wasn't returning continuous measures) nn20 and nn50 returns lists of values e.g,: nn20: [[46.875, 46.875, 507.8125, 507.8125, 281.25], [507.8125, 507.8125, 531.25], [], [210.9375], [218.75, 23.4375, 406.25, 23.4375, 210.9375, 93.75], [23.4375, 210.9375, 93.75, 70.3125, 31.25], [23.4375, 210.9375, 93.75, 70.3125, 31.25, 31.25, 46.875, 117.1875], [218.75, 23.4375, 406.25, 23.4375, 210.9375, 93.75, 70.3125, 31.25, 31.25, 46.875, 70.3125], [23.4375, 210.9375, 93.75, 70.3125, 31.25, 31.25, 46.875, 70.3125],
nn50:
[507.8125, 507.8125, 281.25], [507.8125, 507.8125, 531.25], [], [210.9375], [218.75, 406.25, 210.9375, 93.75], [210.9375, 93.75, 70.3125], [210.9375, 93.75, 70.3125, 117.1875], [218.75, 406.25, 210.9375, 93.75, 70.3125, 70.3125], [210.9375, 93.75, 70.3125, 70.3125], [210.9375, 93.75, 70.3125, 70.3125, 62.5, 62.5], [210.9375, 93.75, 70.3125, 70.3125, 62.5, 62.5], [54.6875, 210.9375, 93.75, 70.3125, 70.3125, 62.5, 62.5, 85.9375], [210.9375, 93.75, 70.3125, 70.3125, 62.5, 62.5, 85.9375], [70.3125, 70.3125, 62.5, 62.5, 85.9375], [70.3125, 62.5, 62.5, 85.9375], [70.3125, 62.5, 62.5, 85.9375], [54.6875, 62.5, 62.5, 85.9375], [62.5, 85.9375, 62.5], [62.5, 85.9375, 62.5, 117.1875], [85.9375, 62.5, 117.1875], [78.125, 62.5, 117.1875],
I'd love to fix that for you but you have quite a lot going on there and my brain said we are not doing this ;)
it works for me but I think it will help others who seek 'plug and play' functionality if there would be some return for indexes to map continuous values to the indexes in the original dataset
Excellent idea. Il'l get on that asap!
I have a bit of 'spare' data at the end of each recording so I don't mind a larger bin at the last index hence the period set to [0,1] position, but it would be nice to have some control over this through a parameter
Also a good idea. If I understand correctly you would like control to either:
let me know if I misunderstand somehow. It's the end of a long week for me :).
when used in full mode (as mentioned fast one wasn't returning continuous measures)
Yeah I picked that issue with 'fast' up this afternoon and pushed out a hotfix. You can pull the latest from the repo.
nn20 and nn50 returns lists of values e.g,:
I see these are appended to 'measures'. That is incorrect as they are temp containers. I'll put them into working_data{}
on the next update. You can skip them.
Hi Paul,
Many thanks!
lump the tail end of the data to the last 'full' bin, thereby making it longer than the others
Yes, that's roughly what I would need here, I'll elaborate a bit more on it in the #16
FYI: since 1.6 I'm getting an error: AttributeError: module 'HeartPy' has no attribute 'make_windows' when I try to access it to calculate periods, I tried it on two separate machines with same results so I doubt it's on my end, have you changed the scope of this function or something? I'm looking at it in hearpy,py and it looks fine but I can't get access to it through the module (works fine within the module). I'm working with a local copy of this function to work around it and it's fine but I though you might want to know about that small access issue ;)
FYI: since 1.6 I'm getting an error: AttributeError: module 'HeartPy' has no attribute 'make_windows' when I try to access it to calculate periods, I tried it on two separate machines with same results so I doubt it's on my end, have you changed the scope of this function or something?
Yes I'm in the process of reformatting the API and separating the internal and external methods to comply with pep-8 standards. I expected you were already using the process_segmentwise()
method to work on your segments. This will be the intended usage from now on.
Let me know if that doesn't work for you and I'll put make_windows()
back into the package namespace with the next push.
Yes, that's roughly what I would need here, I'll elaborate a bit more on it in the #16
Ok I was waiting on confirmation. Great! I'll work on closing the two open issues related to this one tonight
Alright I've pushed the update and closed #14 and #16.
To make the last segment a smaller one, usage is like before: set the 'min size' to whatever you want. To lump the tail end of the data into the last bin, set 'min size' to -1.
wd, m = hp.process_segmentwise(data, sample_rate = 100.0, segment_width = 120, segment_overlap = 0, segment_min_size = -1)
I've made make_windows()
available for use again.
I'm closing this due to inactivity, assuming the issues have been resolved.
Hi!
First: congratulations! your ppg to hr conversion seems to be handling my data better than the application that came with the sensor %) and many thanks for the tutorial on your website!
Quick question: is there a way to access BPM values as they change over time (ideally averaged over a specified window)? From what I understand measures['bpm'] only returns the average, or am I missing something?
Am I correct to understand that I need to use indexes from working_data['peaklist_cor'] to calculate this myself? If so please consider this a feature request ;) I'm sure I don't have to stress how valuable such feature is in research practice and you already have an awesome tool here so that little feature would certainly make this the hottest stuff in town IMHO... at least for those of us cursed with working with ppg ;)
P.s.: any tips on how large the window should be?
Best wishes!