Closed ChengHanChiu closed 1 year ago
The procedure of processing ECG and PPG in PulseDB is:
There is no data leakage, since given a new 10-s segment for testing, you can always remap the signal within the segment between 0 and 1 before using it as the input of the model.
Very clear explanation, thank you very much!
I would like to know the exact pre-processing order.
In the 2.4.2 section of the paper, it is mentioned that "The PPG signal was filtered with a 4th order Chebyshev-II filter at [0.5,8] Hz before presenting to the Elgendi’s algorithm." In section 2.5, it is stated that "After extracting the characteristic points from the records, we selected high-quality segments from the records to form the cleaned PulseDB dataset. Data selection is conducted by dividing each record into 10-s non-overlapping segments, and determining whether to include or discard each segment."
Based on this information, I understand that the correct order is PPG_Raw -> PPG_F -> 10s segment. However, it seems that the precise timing of normalization is not mentioned. In the Supplementary Material, I found the following description: "The amplitude of ECG Raw and PPG Raw signals were linearly remapped between 0 and 1," and "These raw signals can be filtered with user-defined settings to be used as inputs or outputs that fit best to the desired BP estimation method." Therefore, I assume the correct order should be PPG_Raw -> PPG_Norm -> PPG_F -> 10s segment, which aligns with my observation that each 10s segment of PPG_F falls within the range of 0 to 1.
I am concerned about the potential data leakage if direct Min-Max normalization is applied to the entire waveform of MIMICIII's original waveform ( all the signals have valid numerical sample values). Particularly, for Group A's CalBased_Test_Subset, would it be better to fit the Min-Max normalizer on the training set data during the training phase and use the fitted normalizer for transforming the test data during the inference phase?
I hope I have clearly described the points that confuse me. Thanks