ubicomplab / rPPG-Toolbox

rPPG-Toolbox: Deep Remote PPG Toolbox (NeurIPS 2023)
https://arxiv.org/abs/2210.00716
Other
442 stars 106 forks source link

Training on UBFC-PHYS and testing on UBFC-rPPG yield poor results. #204

Closed ycyoon closed 10 months ago

ycyoon commented 1 year ago

Training on UBFC-PHYS and testing on UBFC-rPPG yield results as below.

FFT MAE (FFT Label): 31.766183035714285 +/- 2.9729503718557555 FFT RMSE (FFT Label): 37.152450915467575 +/- 191.0622686231989 FFT MAPE (FFT Label): 29.509393744314337 +/- 2.466404062511122 FFT Pearson (FFT Label): -0.04356500388601246 +/- 0.15796376882345614 FFT SNR (FFT Label): -17.929882317693888 +/- 1.1229903249430426 (dB)

Is this normal? I've used the following config file

BASE: [''] TOOLBOX_MODE: "train_and_test" # "train_and_test" or "only_test" TRAIN: BATCH_SIZE: 4 EPOCHS: 30 LR: 9e-3 MODEL_FILE_NAME: UBFC_UBFC_PURE_tscan DATA: FILTERING: USE_EXCLUSION_LIST: True EXCLUSION_LIST: [ 's3_T1', 's8_T1', 's9_T1', 's26_T1', 's28_T1', 's30_T1', 's31_T1', 's32_T1', 's33_T1', 's40_T1', 's52_T1', 's53_T1', 's54_T1', 's56_T1', 's1_T2', 's4_T2', 's6_T2', 's8_T2', 's9_T2', 's11_T2', 's12_T2', 's13_T2', 's14_T2', 's19_T2', 's21_T2', 's22_T2', 's25_T2', 's26_T2', 's27_T2', 's28_T2', 's31_T2', 's32_T2', 's33_T2', 's35_T2', 's38_T2', 's39_T2', 's41_T2', 's42_T2', 's45_T2', 's47_T2', 's48_T2', 's52_T2', 's53_T2', 's55_T2', 's5_T3', 's8_T3', 's9_T3', 's10_T3', 's13_T3', 's14_T3', 's17_T3', 's22_T3', 's25_T3', 's26_T3', 's28_T3', 's30_T3', 's32_T3', 's33_T3', 's35_T3', 's37_T3', 's40_T3', 's47_T3', 's48_T3', 's49_T3', 's50_T3', 's52_T3', 's53_T3'] SELECT_TASKS: True TASK_LIST: ['T1', 'T2', 'T3'] FS: 35 DATASET: UBFC-PHYS DO_PREPROCESS: False # if first time, should be true DATA_FORMAT: NDCHW DATA_PATH: "/home/yoon/data/PPG/ubfc-phys" # Raw dataset path, need to be updated CACHED_PATH: "/home/yoon/data/PPG/rppg_toolbox/PreprocessedData" # Processed dataset save path, need to be updated EXP_DATA_NAME: "" BEGIN: 0.0 END: 0.8 PREPROCESS: DATA_TYPE: ['DiffNormalized','Standardized'] DATA_AUG: ['None'] # 'None' or 'Motion' is supported, used if the data path points to an augmented dataset or requires augmentation LABEL_TYPE: DiffNormalized DO_CHUNK: True CHUNK_LENGTH: 180 CROP_FACE: DO_CROP_FACE: True USE_LARGE_FACE_BOX: True LARGE_BOX_COEF: 1.5 DETECTION: DO_DYNAMIC_DETECTION: False DYNAMIC_DETECTION_FREQUENCY : 30 USE_MEDIAN_FACE_BOX: False # This should be used ONLY if dynamic detection is used RESIZE: H: 72 W: 72 VALID: DATA: FILTERING: USE_EXCLUSION_LIST: True EXCLUSION_LIST: [ 's3_T1', 's8_T1', 's9_T1', 's26_T1', 's28_T1', 's30_T1', 's31_T1', 's32_T1', 's33_T1', 's40_T1', 's52_T1', 's53_T1', 's54_T1', 's56_T1', 's1_T2', 's4_T2', 's6_T2', 's8_T2', 's9_T2', 's11_T2', 's12_T2', 's13_T2', 's14_T2', 's19_T2', 's21_T2', 's22_T2', 's25_T2', 's26_T2', 's27_T2', 's28_T2', 's31_T2', 's32_T2', 's33_T2', 's35_T2', 's38_T2', 's39_T2', 's41_T2', 's42_T2', 's45_T2', 's47_T2', 's48_T2', 's52_T2', 's53_T2', 's55_T2', 's5_T3', 's8_T3', 's9_T3', 's10_T3', 's13_T3', 's14_T3', 's17_T3', 's22_T3', 's25_T3', 's26_T3', 's28_T3', 's30_T3', 's32_T3', 's33_T3', 's35_T3', 's37_T3', 's40_T3', 's47_T3', 's48_T3', 's49_T3', 's50_T3', 's52_T3', 's53_T3'] SELECT_TASKS: True TASK_LIST: ['T1', 'T2', 'T3'] FS: 35 DATASET: UBFC-PHYS DO_PREPROCESS: False # if first time, should be true DATA_FORMAT: NDCHW DATA_PATH: "/home/yoon/data/PPG/ubfc-phys" # Raw dataset path, need to be updated CACHED_PATH: "/home/yoon/data/PPG/rppg_toolbox/PreprocessedData" # Processed dataset save path, need to be updated EXP_DATA_NAME: "" BEGIN: 0.8 END: 1.0 PREPROCESS: DATA_TYPE: [ 'DiffNormalized','Standardized' ] DATA_AUG: ['None'] # 'None' or 'Motion' is supported, used if the data path points to an augmented dataset or requires augmentation LABEL_TYPE: DiffNormalized DO_CHUNK: True CHUNK_LENGTH: 180 CROP_FACE: DO_CROP_FACE: True USE_LARGE_FACE_BOX: True LARGE_BOX_COEF: 1.5 DETECTION: DO_DYNAMIC_DETECTION: False DYNAMIC_DETECTION_FREQUENCY : 30 USE_MEDIAN_FACE_BOX: False # This should be used ONLY if dynamic detection is used RESIZE: H: 72 W: 72 TEST: METRICS: ['MAE','RMSE','MAPE','Pearson', 'SNR'] USE_LAST_EPOCH: True # to use provided validation dataset to find the best epoch, should be false DATA: FS: 30 DATASET: UBFC-rPPG DO_PREPROCESS: True # if first time, should be true DATA_FORMAT: NDCHW DATA_PATH: "/home/yoon/data/PPG/UBFC/UBFC_DATASET/DATASET_2" # Raw dataset path, need to be updated CACHED_PATH: "/home/yoon/data/PPG/rppg_toolbox/PreprocessedData" # Processed dataset save path, need to be updated EXP_DATA_NAME: "" BEGIN: 0.0 END: 1.0 PREPROCESS: DATA_TYPE: [ 'DiffNormalized','Standardized' ] LABEL_TYPE: DiffNormalized DO_CHUNK: True CHUNK_LENGTH: 180 CROP_FACE: DO_CROP_FACE: True USE_LARGE_FACE_BOX: True LARGE_BOX_COEF: 1.5 DETECTION: DO_DYNAMIC_DETECTION: False DYNAMIC_DETECTION_FREQUENCY : 30 USE_MEDIAN_FACE_BOX: False # This should be used ONLY if dynamic detection is used RESIZE: H: 72 W: 72 DEVICE: cuda:0 NUM_OF_GPU_TRAIN: 1 LOG: PATH: runs/exp MODEL: DROP_RATE: 0.2 NAME: Tscan TSCAN: FRAME_DEPTH: 10 INFERENCE: BATCH_SIZE: 4 EVALUATION_METHOD: "FFT" # "FFT" or "peak detection" EVALUATION_WINDOW: USE_SMALLER_WINDOW: False # Change this if you'd like an evaluation window smaller than the test video length WINDOW_SIZE: 10 # In seconds MODEL_PATH: ""

yahskapar commented 1 year ago

Hi @ycyoon,

I will say that from my experience with the UBFC-PHYS dataset, it's quite difficult to train with due to greater amounts of unconstrained motion (especially in tasks 2 and 3) and certain ground truth being unreliable even after performing subject exclusion. In general UBFC-PHYS might not be a straightforward dataset to train with and then test in a cross-dataset fashion. In the toolbox and its associated pre-print, we only ever show results when testing on UBFC-PHYS rather than when training on it.

Here's a few more things you can try:

1) Using the TASK_LIST config parameter, utilize only videos from task 1 to begin with. These videos have less unconstrained motion due to task 1 being the rest task.

2) Set DO_DYNAMIC_DETECTION and USE_MEDIAN_FACE_BOX to True so that you can get reasonably more robust face detection and cropping results on datasets such as UBFC-PHYS. These are the defaults when testing on UBFC-PHYS using the toolbox, for example in this config.

3) Try intra-dataset testing first, perhaps after following 1) and 2) above. You will have 42 task 1 videos to work, using which you can easily do a split like 34 train videos and 8 test videos with unique subjects in each split. I personally have tried this before and got reasonable results within the context of UBFC-PHYS (e.g., not a crazy high MAE like you got).

All the best! We will also be making some updates to the face detection and cropping part of the toolbox very soon, but I doubt that will noticeably help your results in this case where you're training on UBFC-PHYS and then testing on another dataset.

ycyoon commented 12 months ago

Dear @yahskapar,

Thank you for your thoughtful response. I tried your suggestions 1) and 2), but the results were not significantly different. In addition to UBFC-PHYS, I also experimented with VICAR (https://www.vicarvision.nl/pub/RPPG_Facial_Expression_Analysis.pdf), MAHNOB, PURE, and the cohface dataset. While Mahnob, PURE, and UBFC-rppg, cohface yielded reasonable results, UBFC-PHYS and VICAR,produced very poor results. I don't believe this is due to issues with the algorithm or source files. There might be some differences in certain datasets since I've run various open-source rppg training algorithms. However, I couldn't identify any issues within the datasets themselves. Do you have any insights as to why training with some datasets results in poor outcomes?

yahskapar commented 11 months ago

@ycyoon,

Be sure to give 3) from my previous reply a shot too just to make sure you also can get some kind of reasonable result. As for why you might have poor outcomes when using UBFC-PHYS as training data for cross-dataset experiments, a few possibilities:

1) UBFC-PHYS training is just much more difficult to train on. This might be especially true for data in tasks 2 and 3, where the rigid and non-rigid head and face motion can make it significantly more difficult to use as training data. It's very possible that existing methods (e.g., neural methods supported by this toolbox currently) are not suitable for such training data, and it's worth verifying this by observing the training loss, and validation loss if you use a validation set, to see if it actually behaves as one would expect if the underlying PPG signal is being learned.

2) Imperfect pre-processing. I'd investigate this based on the instructions here. UBFC-PHYS in particular, due to the amount of rigid head motion I mentioned before, may suffer from poor frames after pre-processing by this toolbox. A way to remedy that is using dynamic detection, as I mentioned before, but overall our face detection and cropping pipeline is not as strong as it can be. I plan on making a significant update to this part of the toolbox in the coming month or so when I have time in between my current research projects.

Have you also played around with any of the hyperparameters (e.g., learning rate, epochs, batch size)? That is, of course, much more time consuming, but may also be worth taking a look at if you haven't messed around with any of those settings.

Best regards,

Akshay

ycyoon commented 11 months ago

@yahskapar, thank you for your reply. I have also reviewed UBFC-PHYS and attempted only Task 1 with dynamic detection. I examined the extracted faces and found that there were not many errors for Task 1. However, the results were still poor despite trying various hyperparameters such as learning rate, frame length, and so on. None of the trials showed satisfactory performance. Lastly, I tried training with merged datasets from UBFC-2, PURE, and Cohface, and then applied the trained model to UBFC-PHYS. I then compared the predicted results with the ground truth scores (GTS) of UBFC-PHYS, and they were completely different! Now, I'm beginning to suspect that the measuring devices used by the UBFC-PHYS team may differ from other methods.

yahskapar commented 11 months ago

That's a fair point, even between UBFC-rPPG and UBFC-PHYS there appears to be a difference. UBFC-rPPG utilizes a Logitech C920 HD Pro webcam for recording at 30 FPS (result is a 640x480 video ultimately) alongside a CMS50E transmissive pulse oximeter for ground truth PPG. UBFC-PHYS on the other hand uses a EO-23121C RGB digital camera with a motion JPEG compression at 35 FPS, and an Empatica E4 wristband for PPG data (as well as skin temperature and EDA signals).

It's possible that the motion JPEG compression especially has an impact on whether or not networks, such as the common ones supported by this toolbox, can reasonably extract PPG signals during training or testnig without additional modifications.

yahskapar commented 10 months ago

I'll go ahead and close this issue for the time being due to the lack of discussion. @ycyoon, feel free to re-open or make a new one if you come across anything interesting to add to our previous discussion here.