pritamqu / SSL-ECG

Self-supervised ECG Representation Learning - ICASSP 2020 and IEEE T-AFFC
https://www.pritamsarkar.com
Other
33 stars 8 forks source link

Data preprocessing codes #1

Closed katerynaCh closed 2 years ago

katerynaCh commented 2 years ago

Hi! I am facing issues trying to reproduce your results on AMIGOS dataset for both binary and multiclass classification (both with training from scratch and with extracting the features first with your provided model). The results I am getting are about 10% lower than reported ones. Can you please share more explicitly what kind of preprocessing has been done. You report the high-pass filtering at 0.8 Hz, in one of the papers you also report person-specific z-normalization. Has anything else been done? In which order have they been performed? Did you apply high-pass filter on the whole sequence or on 10-second segments? Also, did you use the self-reported labels or labels from external annotators?

In general, if you could share the preprocessing codes that would be extremely helpful, even if they are messy.

pritamqu commented 2 years ago

hi - thanks for your interest... I was wondering, are you using the uploaded pretrained models? or did you train the model by yourself? could you please tell me a bit more about your experiment setup? Because it's quite expected to have minor differences in accuracies but surely not as big as 10%!!

katerynaCh commented 2 years ago

Thanks for quick reply! I have tried both ways, training from scratch or using your pretrained model. I preprocess the ECGs by applying a high-pass filter at 0.8 Hz, split into 10 sec non-overlapping segments and et 2560-length vectors, and z-normalize the data (per person). With the provided pretrained model, I extract the features and feed them to the supervised model for Amigos dataset, and train as described in the paper for 100 epochs. In the end I am getting around 72% for binary classification of arousal ( where the binary labels are given as < 5 being negative, otherwise positive). So I suppose that the issue should be somewhere in preprocessing before the feature extraction.

pritamqu commented 2 years ago

thanks for sharing your setup.. however, I am not sure what is exactly going wrong at your end.. I used Matlab for this filter part and here is the filter code; the same filter can also be designed in python. hope this helps!

highpass_filter = designfilt('highpassiir', 'StopbandFrequency', 0.4, 'PassbandFrequency', 0.8, ...
'StopbandAttenuation', 60, 'PassbandRipple', 1, 'SampleRate', 256, 'DesignMethod', 'cheby2');
ZaraNaSha commented 2 years ago

Hi, Thanks for your paper and implementation. I also has a problem with the filtered data. could you share your matlab code that you use for your data? Best regards.

ZaraNaSha commented 2 years ago

Hi! I am facing issues trying to reproduce your results on AMIGOS dataset for both binary and multiclass classification (both with training from scratch and with extracting the features first with your provided model). The results I am getting are about 10% lower than reported ones. Can you please share more explicitly what kind of preprocessing has been done. You report the high-pass filtering at 0.8 Hz, in one of the papers you also report person-specific z-normalization. Has anything else been done? In which order have they been performed? Did you apply high-pass filter on the whole sequence or on 10-second segments? Also, did you use the self-reported labels or labels from external annotators?

In general, if you could share the preprocessing codes that would be extremely helpful, even if they are messy.

Hi, I want to know that could you prepare the data for train the model? Is it possible for you to send your code for me? Best regards.

katerynaCh commented 2 years ago

Hi! I am facing issues trying to reproduce your results on AMIGOS dataset for both binary and multiclass classification (both with training from scratch and with extracting the features first with your provided model). The results I am getting are about 10% lower than reported ones. Can you please share more explicitly what kind of preprocessing has been done. You report the high-pass filtering at 0.8 Hz, in one of the papers you also report person-specific z-normalization. Has anything else been done? In which order have they been performed? Did you apply high-pass filter on the whole sequence or on 10-second segments? Also, did you use the self-reported labels or labels from external annotators? In general, if you could share the preprocessing codes that would be extremely helpful, even if they are messy.

Hi, I want to know that could you prepare the data for train the model? Is it possible for you to send your code for me? Best regards.

Hi, after some trying I was still not able to reproduce the results reported in the paper with my preprocessing implementation

pritamqu commented 2 years ago

@katerynaCh @zara6697 could you please share the preprocessing codes that you're trying, I may quickly check and let you know if I see any issue. Otherwise, I already shared the filter I used here: https://github.com/pritamqu/SSL-ECG/issues/1#issuecomment-991761706 You may consider seeing the paper as well that has detailed description.

ZaraNaSha commented 2 years ago

Hi! I am facing issues trying to reproduce your results on AMIGOS dataset for both binary and multiclass classification (both with training from scratch and with extracting the features first with your provided model). The results I am getting are about 10% lower than reported ones. Can you please share more explicitly what kind of preprocessing has been done. You report the high-pass filtering at 0.8 Hz, in one of the papers you also report person-specific z-normalization. Has anything else been done? In which order have they been performed? Did you apply high-pass filter on the whole sequence or on 10-second segments? Also, did you use the self-reported labels or labels from external annotators? In general, if you could share the preprocessing codes that would be extremely helpful, even if they are messy.

Hi, I want to know that could you prepare the data for train the model? Is it possible for you to send your code for me? Best regards.

Hi, after some trying I was still not able to reproduce the results reported in the paper with my preprocessing implementation

Ok, if I reach the result, I would send you the code. thanks.

ZaraNaSha commented 2 years ago

@katerynaCh @zara6697 could you please share the preprocessing codes that you're trying, I may quickly check and let you know if I see any issue. Otherwise, I already shared the filter I used here: #1 (comment) You may consider seeing the paper as well that has detailed description.

thanks, I used it but the files of signals and the labels is not defined. I do not understand how to save the files. this is my code which I used for AMIGOS dataset. the text file and the label file is not clear. path ='C:\Users\p\Downloads\Compressed\am_dataset\'; name = dir([path '*.zip']); highpass_filter = designfilt('highpassiir', 'StopbandFrequency', 0.4, 'PassbandFrequency', 0.8,'StopbandAttenuation', 60, 'PassbandRipple', 1, 'SampleRate', 256, 'DesignMethod', 'cheby2'); for i=1:length(name) a = unzip([path name(i).name],path); aa1 = load(a{1}); for j=1:20 aa = aa1.ECG_DATA{j}; aa = aa(:,2); bb = filter(highpass_filter,aa); T = table(bb, 'VariableNames', { '1'} ); writetable(T, [path 'filtered\' name(i).name(1:end-4) num2str(j) '.txt']); end end Best regards.

pritamqu commented 2 years ago

I am adding a piece of preprocessing code here for reference. Hope this helps.

import numpy as np
from biosppy.signals import tools as tools

def filter_ecg(signal, sampling_rate):

    signal = np.array(signal)
    order = int(0.3 * sampling_rate)
    filtered, _, _ = tools.filter_signal(signal=signal,
                                  ftype='FIR',
                                  band='bandpass',
                                  order=order,
                                  frequency=[3, 45],
                                  sampling_rate=sampling_rate)

    return filtered
ZaraNaSha commented 2 years ago

I am adding a piece of preprocessing code here for reference. Hope this helps.

import numpy as np
from biosppy.signals import tools as tools

def filter_ecg(signal, sampling_rate):

    signal = np.array(signal)
    order = int(0.3 * sampling_rate)
    filtered, _, _ = tools.filter_signal(signal=signal,
                                  ftype='FIR',
                                  band='bandpass',
                                  order=order,
                                  frequency=[3, 45],
                                  sampling_rate=sampling_rate)

    return filtered

Thanks for your help, could you also help me and say for using the function "def extract_amigos_dataset(overlap_pct, window_size_sec, data_save_path, save):", how should I save the files (for example one file for each subject or all subject in one file)? how should I save the label file? another question about this function why do you sort the data (line 300 data = np.sort(data) )in this function?

Best regards.

dousocool commented 2 years ago

我在这里添加了一段预处理代码以供参考。希望这有帮助。

import numpy as np
from biosppy.signals import tools as tools

def filter_ecg(signal, sampling_rate):

    signal = np.array(signal)
    order = int(0.3 * sampling_rate)
    filtered, _, _ = tools.filter_signal(signal=signal,
                                  ftype='FIR',
                                  band='bandpass',
                                  order=order,
                                  frequency=[3, 45],
                                  sampling_rate=sampling_rate)

    return filtered

感谢您的帮助,您也可以帮助我并说使用“def extract_amigos_dataset(overlap_pct,window_size_sec,data_save_path,保存):”功能,我应该如何保存文件(例如,每个主题或所有主题在一个文件中的一个文件)?我应该如何保存标签文件?关于这个函数的另一个问题,为什么你在这个函数中对数据进行排序(第300行数据= np.sort(data))?

此致敬意。

Hello, I have also encountered difficulties in data processing. I cannot convert dreamer original dataset and wesad original dataset into the format required by the model. How is your current progress? Can you share the format of your dataset after conversion?