neuropsychology / NeuroKit.py

A Python Toolbox for Statistics and Neurophysiological Signal Processing (EEG, EDA, ECG, EMG...).
http://neurokit.rtfd.io
MIT License
363 stars 102 forks source link

NeuroKit Warning: ecg_hrv(): Correlation Dimension. Error: NeuroKit warning: complexity_entropy_multiscale(): Signal might be to short to compute SampEn for scale factors > 0. Setting max_scale_factor to 0. #61

Open Sanjay1995 opened 6 years ago

Sanjay1995 commented 6 years ago

I am giving the data in form of 1D numpy array which is basically i, ii, iii, v1, v2, v3, v4, v5, v6, avr, avl, avf features from PTB database. I call this function nk.bio_process(ecg=ecg_signal[:,i],ecg_quality_model=None). It operates good on several columns but gives error after 2 iterations. Moreover, I have checked data is fine enough in all columns. can anyone please resolve the issue

DominiqueMakowski commented 6 years ago

Hi @Sanjay1995, could you provide an example of your dataset? We'll try to fix the error. It might be related to a recent change in HRV computation (#58). Also linking @gattia just in case :)

Sanjay1995 commented 6 years ago

my dataset is ptb ecg database. you can check it on https://www.physionet.org/physiobank/database/ptbdb/

Sanjay1995 commented 6 years ago

firstly you read the description. I want to know how to convert these twelve columns (i, ii, iii, avr, avl, avf, v1, v2, v3, v4, v5, v6) into features for classification of ecg using your library NeuroKit. It would be your thanks.

Sanjay1995 commented 6 years ago

@DominiqueMakowski

gattia commented 6 years ago

Based on the print out it seems that the sample entropy is having problems on the very first pass at the full resolution scale for the multi-scale analysis (this should be the same as just running sample entropy on the full data). This shouldn't be a problem from anything done to the multiscale entropy function recently.

If I were debugging it, I'd be interested in what the data that is being passed to complexity_entropy_multiscale() looks like - whats it's shape, min, max values, how does it looked graphed out, etc.

DominiqueMakowski commented 6 years ago

@Sanjay1995 As I understand it, you're basically trying to run the ECG processing routine on all of the ECG leads. However, the routine attempts, first, to extract R peaks, then computes several indices based on these R peaks (heart rate, HRV, and so on). The default cardiac complex segmenter works preferentially with LEAD 1 (i in your data). So I believe this is quite normal if it doesn't work with the other signals. It seems that you're trying to compute the same features based on different leads which are not appropriate for the traditional segmenting.

I am not sure what your end goal is, but neurokit's ecg routine currently preferentially works with LEAD 1 data (for extracting features then use them for whatever else), not comparing different leads between them. With that being said, you could use changing the default segmenter (ecg_segmenter = "hamilton", "gamboa", "engzee", "christov" or "ssf"). Critically, check if the R peaks were detected correctly. Also, try using "ecg_preprocess()" to simplify debugging.

I hope this was useful. Let me know of your progress,

@gattia thanks :)

Sanjay1995 commented 6 years ago

screenshot from 2018-04-14 18-23-07

Sanjay1995 commented 6 years ago

this is how my one data column which I (i) as I mentioned above looks like

Sanjay1995 commented 6 years ago

@gattia

Sanjay1995 commented 6 years ago

thanks @DominiqueMakowski it is really useful, but tell me ecg_preprocess() and bio_process() works alike in my case?

DominiqueMakowski commented 6 years ago

@Sanjay1995 yes bio_process is just a wrapper for processing multiple signals (ECG, EDA, EMG etc.) at once. Using bio-process with only ecg is similar to using ecg_process. However, ecg_process uses itself, the ecg_preprocess function that only does low level preprocessing (mainly extracting R peaks and not computing more complex indices such as HRV for example).

Sanjay1995 commented 6 years ago

as I have already mentioned you that I am using dataset of PTB ecg dataset, and your library also inculcates it. But ecg_preprocess() also fails on some signals giving error (index 0 is out of bounds for axis 0) don't know why.

Sanjay1995 commented 6 years ago

If I only process LEAD 1 (i in my data), then would it give features which ('T_Waves', 'Cardiac_Cycles', 'P_Waves', 'Q_Waves', 'HRV', 'R_Peaks') helps me in classification of heart disease class.

DominiqueMakowski commented 6 years ago

yes, you should use only the column of the dataset corresponding to i. I used the full PTB dataset only to create a machine learning model that automatically classifies the provided lead signal and returns the probability of correct classification (a proxy of signal quality). But for investigating ECG features using only LEAD I is sufficient.

Sanjay1995 commented 6 years ago

@DominiqueMakowski thanks.

waleedkaimkhani commented 6 years ago

@DominiqueMakowski I am getting same error ecg_process index 0 is out of bound error but I have the signal of length greater than 1

DominiqueMakowski commented 6 years ago

@waleedkaimkhani could you provide a sample of your data? thanks

waleedkaimkhani commented 6 years ago

my dataset is ptb ecg database. you can check it on https://www.physionet.org/physiobank/database/ptbdb/

DominiqueMakowski commented 6 years ago

haha alright;

1) did you select correctly a one dimensional array OR one pandas' dataframe column (corresponding to LEAD 1)? 2) if yes, could you save (in txt, csv or json) this unique column or array and attach it here so I can check directly with the exact input you provide to neurokit's routines? Thanks 😅

waleedkaimkhani commented 6 years ago

screenshot from 2018-04-14 22 35 00

DominiqueMakowski commented 6 years ago

or send it to me dom.makowski@gmail.com

waleedkaimkhani commented 6 years ago

@DominiqueMakowski i have sent you mail

DominiqueMakowski commented 6 years ago

@waleedkaimkhani your code should look like that

import neurokit as nk
import pandas as pd

df = pd.read_csv("file.csv")
ecg_processed = nk.ecg_process(ecg=df["i"], sampling_rate=1000)