Closed Arktis2022 closed 12 months ago
Hi @Arktis2022,
I just tried reproducing the result and was able to get an MAE of 1.76 and a MAPE of 1.96 just now using an NVIDIA RTX 6000. Unfortunately, I don't currently have access to an NVIDIA RTX 4500, which was the GPU I used to train PhysFormer for this toolbox (i.e., the pre-trained model you mentioned). Regardless, what you're getting (MAE around 7) sounds way off, so let's troubleshoot this:
main.py
? Can you confirm your dataset lengths match mine below:Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/train/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNorm
alized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse
File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/train/DataFileLists/PURE_SizeW128_SizeH128_ClipLength160_Data
TypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_b
oxFalse_0.0_0.8.csv
train Preprocessed Dataset Length: 596
Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/val/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormal
ized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse
File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/val/DataFileLists/PURE_SizeW128_SizeH128_ClipLength160_DataTy
peDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_box
False_0.8_1.0.csv
valid Preprocessed Dataset Length: 154
Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/test/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiff
Normalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse
File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/test/DataFileLists/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_
DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_fa
ce_boxFalse_0.0_1.0.csv
test Preprocessed Dataset Length: 483
If they don't match, let's troubleshoot the datasets you downloaded further. I'd also be curious if they somehow matched, but you also got strange results when training with PURE and testing on UBFC-rPPG with another architecture (e.g., TS-CAN).
Can you share more details regarding your GPU, your version of PyTorch, and what version of CUDA you compiled PyTorch with? This will vary from toolbox user to toolbox user depending on what type of GPUs they have access to. In my case, since I mainly use RTX A4500s and RTX A6000s, I used torch 1.8.2
with cu111
(CUDA 11.1).
Are you using the default config settings in PURE_PURE_UBFC-rPPG_PHYSFORMER_BASIC.yaml without changing any of the hyperparameters, and without changing other aspects of the toolbox (e.g., dataloading)? The train config as is should produce a result reasonably close to the pre-trained model result that you were able to verify.
My guess is that there is some issue with the datasets you downloaded (e.g., incomplete download, corrupted download, etc) as that is an issue I've seen quite a few times in the past. Also, if it ends up not being the datasets you downloaded having some kind of issue or some difference in your version of the code (e.g., a branch on a fork) that wasn't accounted for, I'd also appreciate it if you shared with us your complete terminal output when training.
For reference, here is my my terminal output (following the config echo) when training:
Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/train/PURE_SizeW128_SizeH128_ClipLength160_DataTy[543/1681]
alized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse
File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/train/DataFileLists/PURE_SizeW128_SizeH128_ClipLength160_Data
TypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_b
oxFalse_0.0_0.8.csv
train Preprocessed Dataset Length: 596
Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/val/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormal
ized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse
File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/val/DataFileLists/PURE_SizeW128_SizeH128_ClipLength160_DataTy
peDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_box
False_0.8_1.0.csv
valid Preprocessed Dataset Length: 154
Preprocessing dataset...
0%|
| 0/42 [00:00<?, ?it/s]Warning: More than one faces are detected(Only cropping the biggest one.)
5%|██████ | 2/4
2 [00:42<13:45, 20.63s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
45%|████████████████████████████████████████████████████████▌ | 19/4
2 [01:34<02:08, 5.60s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
50%|██████████████████████████████████████████████████████████████▌ | 21/4
2 [01:36<01:15, 3.60s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
64%|████████████████████████████████████████████████████████████████████████████████▎ | 27/4
2 [01:52<00:54, 3.61s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
71%|█████████████████████████████████████████████████████████████████████████████████████████▎ | 30/4
2 [02:00<00:28, 2.37s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
83%|████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 35/4
2 [02:22<00:29, 4.18s/it]Warning: More than one faces are detected(Only cropping the biggest one.)
Warning: More than one faces are detected(Only cropping the biggest one.)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/4
2 [02:52<00:00, 4.12s/it]
Total Number of raw files preprocessed: 42
Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/test/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiff
Normalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse
File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/test/DataFileLists/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_
DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_fa
ce_boxFalse_0.0_1.0.csv
test Preprocessed Dataset Length: 483
====Training Epoch: 0====
66%|███████████████████████████▉ | 99/149 [00:50<00:23, 2.11it/s]
epoch:0, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.8178,
b:1.000, kl:3.511, fre_CEloss:4.932, hr_mae:31.735
100%|█████████████████████████████████████████| 149/149 [01:14<00:00, 2.01it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru
e_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch0.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:07<00:00, 5.02it/s]
Validation RMSE:37.175, batch:149
Update best model! Best epoch: 0
====Training Epoch: 1====
66%|███████████████████████████▉ | 99/149 [00:49<00:24, 2.06it/s]
epoch:1, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.4911,
b:1.000, kl:3.496, fre_CEloss:4.922, hr_mae:21.746
100%|█████████████████████████████████████████| 149/149 [01:13<00:00, 2.02it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru
e_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch1.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.11it/s]
Validation RMSE:37.318, batch:149
====Training Epoch: 2====
66%|███████████████████████████▉ | 99/149 [00:49<00:24, 2.08it/s]
epoch:2, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.4048,
b:1.000, kl:3.490, fre_CEloss:4.919, hr_mae:19.231
100%|█████████████████████████████████████████| 149/149 [01:13<00:00, 2.03it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru
e_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch2.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.11it/s]
Validation RMSE:34.148, batch:149
Update best model! Best epoch: 2
====Training Epoch: 3====
66%|███████████████████████████▉ | 99/149 [00:49<00:23, 2.09it/s]
epoch:3, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.3032,
b:1.000, kl:3.488, fre_CEloss:4.916, hr_mae:17.234
100%|█████████████████████████████████████████| 149/149 [01:13<00:00, 2.03it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru
e_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch3.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 5.92it/s]
Validation RMSE:32.969, batch:149
Update best model! Best epoch: 3
====Training Epoch: 4====
66%|███████████████████████████▉ | 99/149 [00:49<00:23, 2.09it/s]
epoch:4, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.2381,
b:1.000, kl:3.494, fre_CEloss:4.915, hr_mae:16.749
100%|█████████████████████████████████████████| 149/149 [01:13<00:00, 2.03it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru
e_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch4.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 5.99it/s]
Validation RMSE:33.763, batch:149
====Training Epoch: 5====
66%|███████████████████████████▉ | 99/149 [00:49<00:24, 2.08it/s]
epoch:5, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.1616,
b:1.000, kl:3.489, fre_CEloss:4.914, hr_mae:11.267
100%|█████████████████████████████████████████| 149/149 [01:13<00:00, 2.03it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru
e_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch5.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.30it/s]
Validation RMSE:30.616, batch:149
Update best model! Best epoch: 5
====Training Epoch: 6====
66%|███████████████████████████▉ | 99/149 [00:48<00:23, 2.10it/s]
epoch:6, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.1287,
b:1.000, kl:3.485, fre_CEloss:4.913, hr_mae:9.849
100%|█████████████████████████████████████████| 149/149 [01:13<00:00, 2.04it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru
e_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch6.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.23it/s]
Validation RMSE:37.454, batch:149
====Training Epoch: 7====
66%|███████████████████████████▉ | 99/149 [00:49<00:24, 2.05it/s]
epoch:7, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.1064,
b:1.000, kl:3.487, fre_CEloss:4.913, hr_mae:11.021
100%|█████████████████████████████████████████| 149/149 [01:13<00:00, 2.02it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru
e_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch7.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.30it/s]
Validation RMSE:43.918, batch:149
====Training Epoch: 8====
66%|███████████████████████████▉ | 99/149 [00:49<00:23, 2.08it/s]
epoch:8, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.0843,
b:1.000, kl:3.483, fre_CEloss:4.911, hr_mae:8.060
100%|█████████████████████████████████████████| 149/149 [01:13<00:00, 2.02it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru
e_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch8.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 5.77it/s]
Validation RMSE:39.366, batch:149
====Training Epoch: 9====
66%|███████████████████████████▉ | 99/149 [00:49<00:24, 2.05it/s]
epoch:9, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.0731,
b:1.000, kl:3.485, fre_CEloss:4.911, hr_mae:7.898
100%|█████████████████████████████████████████| 149/149 [01:14<00:00, 2.00it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru
e_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch9.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 5.76it/s]
Validation RMSE:39.605, batch:149
best trained epoch: 5, min_val_loss: 30.615833555070317
Saving plots of losses and learning rates to: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelType
DiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/plots
===Testing===
Testing uses best epoch selected using model selection as non-pretrained model!
runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Lar
ge_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch5.pth
100%|███████████████████████████████████████████| 42/42 [00:32<00:00, 1.27it/s]
FFT MAE (FFT Label): 1.7578125 +/- 0.5560266569023335
FFT RMSE (FFT Label): 4.009346804526597 +/- 7.956243690132235
FFT MAPE (FFT Label): 1.9621455191799881 +/- 0.6267998720798567
FFT Pearson (FFT Label): 0.9738538837329217 +/- 0.03591956748706233
FFT SNR (FFT Label): -0.36154201443942985 +/- 1.118238865456478 (dB)
Saved PURE_PURE_UBFC-rPPG_physformer_FFT_BlandAltman_ScatterPlot.pdf to runs/exp/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiffN
ormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/b
land_altman_plots.
Saved PURE_PURE_UBFC-rPPG_physformer_FFT_BlandAltman_DifferencePlot.pdf to runs/exp/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDi
ffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFals
e/bland_altman_plots.
Thank you very much for your answer. I found that this issue is caused by the difference of GPU and random seed. I changed che random seed from 100 to 42, and my complete terminal output when training is as follows:
Cached Data Path /data/gscratch/ubicomp/xliu0/data3/mnt/Datasets/rppg_toolbox/PreprocessedData/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse
File List Path /data/gscratch/ubicomp/xliu0/data3/mnt/Datasets/rppg_toolbox/PreprocessedData/DataFileLists/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse_0.0_0.8.csv
train Preprocessed Dataset Length: 596
Cached Data Path /data/gscratch/ubicomp/xliu0/data3/mnt/Datasets/rppg_toolbox/PreprocessedData/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse
File List Path /data/gscratch/ubicomp/xliu0/data3/mnt/Datasets/rppg_toolbox/PreprocessedData/DataFileLists/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse_0.8_1.0.csv
valid Preprocessed Dataset Length: 154
Cached Data Path /data/gscratch/ubicomp/xliu0/data3/mnt/Datasets/rppg_toolbox/PreprocessedData/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse
File List Path /data/gscratch/ubicomp/xliu0/data3/mnt/Datasets/rppg_toolbox/PreprocessedData/DataFileLists/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse_0.0_1.0.csv
test Preprocessed Dataset Length: 483
====Training Epoch: 0====
66%|███████████████████████████▉ | 99/149 [01:30<00:25, 1.98it/s]
epoch:0, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:1.0003,
b:1.000, kl:3.513, fre_CEloss:4.939, hr_mae:47.902
100%|█████████████████████████████████████████| 149/149 [02:05<00:00, 1.19it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch0.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:30<00:00, 1.29it/s]
Validation RMSE:35.772, batch:149
Update best model! Best epoch: 0
====Training Epoch: 1====
66%|███████████████████████████▉ | 99/149 [00:51<00:24, 2.01it/s]
epoch:1, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.5540,
b:1.000, kl:3.501, fre_CEloss:4.925, hr_mae:25.440
100%|█████████████████████████████████████████| 149/149 [01:16<00:00, 1.96it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch1.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.16it/s]
Validation RMSE:39.978, batch:149
====Training Epoch: 2====
66%|███████████████████████████▉ | 99/149 [00:50<00:24, 2.01it/s]
epoch:2, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.4140,
b:1.000, kl:3.489, fre_CEloss:4.919, hr_mae:20.245
100%|█████████████████████████████████████████| 149/149 [01:15<00:00, 1.97it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch2.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.22it/s]
Validation RMSE:35.527, batch:149
Update best model! Best epoch: 2
====Training Epoch: 3====
66%|███████████████████████████▉ | 99/149 [00:50<00:24, 2.02it/s]
epoch:3, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.2948,
b:1.000, kl:3.491, fre_CEloss:4.916, hr_mae:17.286
100%|█████████████████████████████████████████| 149/149 [01:15<00:00, 1.96it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch3.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.21it/s]
Validation RMSE:35.600, batch:149
====Training Epoch: 4====
66%|███████████████████████████▉ | 99/149 [00:50<00:25, 2.00it/s]
epoch:4, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.2132,
b:1.000, kl:3.490, fre_CEloss:4.916, hr_mae:15.157
100%|█████████████████████████████████████████| 149/149 [01:15<00:00, 1.96it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch4.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.21it/s]
Validation RMSE:34.885, batch:149
Update best model! Best epoch: 4
====Training Epoch: 5====
66%|███████████████████████████▉ | 99/149 [00:50<00:24, 2.01it/s]
epoch:5, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.1403,
b:1.000, kl:3.484, fre_CEloss:4.914, hr_mae:12.155
100%|█████████████████████████████████████████| 149/149 [01:15<00:00, 1.97it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch5.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.22it/s]
Validation RMSE:38.385, batch:149
====Training Epoch: 6====
66%|███████████████████████████▉ | 99/149 [00:50<00:24, 2.03it/s]
epoch:6, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.1116,
b:1.000, kl:3.489, fre_CEloss:4.913, hr_mae:7.888
100%|█████████████████████████████████████████| 149/149 [01:15<00:00, 1.97it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch6.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.30it/s]
Validation RMSE:31.716, batch:149
Update best model! Best epoch: 6
====Training Epoch: 7====
66%|███████████████████████████▉ | 99/149 [00:51<00:25, 1.98it/s]
epoch:7, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.0831,
b:1.000, kl:3.486, fre_CEloss:4.911, hr_mae:6.334
100%|█████████████████████████████████████████| 149/149 [01:16<00:00, 1.96it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch7.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.11it/s]
Validation RMSE:40.648, batch:149
====Training Epoch: 8====
66%|███████████████████████████▉ | 99/149 [00:50<00:24, 2.02it/s]
epoch:8, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.0785,
b:1.000, kl:3.483, fre_CEloss:4.911, hr_mae:7.084
100%|█████████████████████████████████████████| 149/149 [01:15<00:00, 1.98it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch8.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.20it/s]
Validation RMSE:38.094, batch:149
====Training Epoch: 9====
66%|███████████████████████████▉ | 99/149 [00:51<00:24, 2.02it/s]
epoch:9, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.0653,
b:1.000, kl:3.485, fre_CEloss:4.910, hr_mae:5.987
100%|█████████████████████████████████████████| 149/149 [01:15<00:00, 1.96it/s]
Saved Model Path: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch9.pth
====Validating===
100%|███████████████████████████████████████████| 39/39 [00:06<00:00, 6.23it/s]
Validation RMSE:22.860, batch:149
Update best model! Best epoch: 9
best trained epoch: 9, min_val_loss: 22.8599205971596
Saving plots of losses and learning rates to: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/plots
===Testing===
Testing uses best epoch selected using model selection as non-pretrained model!
runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch9.pth
100%|███████████████████████████████████████████| 42/42 [28:59<00:00, 41.43s/it]
FFT MAE (FFT Label): 1.9461495535714286 +/- 0.6956462328481609
FFT RMSE (FFT Label): 4.910426936474206 +/- 14.743013450842696
FFT MAPE (FFT Label): 2.0009059329411953 +/- 0.6795119697286518
FFT Pearson (FFT Label): 0.9627717354762423 +/- 0.04274066721755305
FFT SNR (FFT Label): -0.8674582305770593 +/- 1.1682313063007959 (dB)
Saved PURE_PURE_UBFC-rPPG_physformer_FFT_BlandAltman_ScatterPlot.pdf to runs/exp/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/bland_altman_plots.
Saved PURE_PURE_UBFC-rPPG_physformer_FFT_BlandAltman_DifferencePlot.pdf to runs/exp/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/bland_altman_plots.
Saving outputs to: runs/exp/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/saved_test_outputs/PURE_PURE_UBFC-rPPG_physformer_outputs.pickle
Perhaps it's due to the insufficient robustness of this model that the impact of random numbers is too significant?
Those results seem closer, but I still feel something is off here - can you share more details that I requested in my previous reply (2 and 3, regarding your GPU, torch version, and config details)? Depending on what specific GPU you have, we can maybe try to reproduce the previous result you got
The seed setting being changed definitely will make a difference, but it's been set to 100 for a while now (maybe an year or so at this point) and I don't believe other folks (with different GPUs than the ones I mentioned) have noticed that big of a difference with the setting of 100 (e.g., MAE of 7 versus 1.7 or so).
Few more things in addition to the extra information I requested:
I've found the impact of the random seed to be substantial for Physformer, occasionally resulting in notably poor outcomes. Additionally, may I ask a question? Where can I find the results for Physformer, such as the mentioned MAE of 1.44 in the table?
@zizheng-guo,
The paper version being referred to (which is to be published in NeurIPS 2023, and will also eventually be updated on arXiv) with those results can be found here on OpenReview: https://openreview.net/pdf?id=q4XNX15kSe. Check out Table 2 in that PDF.
I'm also curious about your GPU, torch version, and config details if you can share that information. It's possible there is some instability introduced by some layer, maybe the BatchNorm layers, that are present in the PhysFormer model but not present in other models (e.g., TS-CAN). I'd like to better investigate this and narrow this down with different GPUs and PyTorch versions accounted for at the very least, however. I've yet to be able to produce a poorer training result with the default config and while training on a single RTX A4500, a single RTX A6000, or a single RTX 2070 Super (my personal computer).
@yahskapar Thanks for your response. I used the default torch version in the toolbox, and the GPU is a RTX A6000. In my experiments, this significant difference only occurred in cross-dataset testing between UBFC and PURE. I believe this is due to the larger distribution gap between them and the smaller data size, which results in greater random effects.
@zizheng-guo,
Are you sure you're using the default torch version in the toolbox? I didn't think this was possible with RTX A6000s since my understanding was that it requires sm_86
compatibility that isn't supported by the toolbox defaults. Can you check using pip list
or conda list
and share that information here?
You can try your experiments again using the exact same (Linux) install command that I used with my RTX A6000:
pip install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
Your hypothesis involving greater random effects could be true, but I've also noticed some significant differences when batch norm layers are involved with certain versions of PyTorch and CUDA compilations that are compatible with GPUs like the RTX A4500 or the A6000. It would be good to better understand the impact of a different PyTorch version.
@yahskapar,
I'm a bit confused. I'm certain I'm using the default Torch version 1.12.1, which should support sm_86.
print(torch.version) 1.12.1 torch.cuda.get_arch_list() ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37']
@zizheng-guo,
What version of CUDA are you using? Can you check using print(torch.version.cuda)
? Maybe I misremember the exact details of why I had to use the install command I mentioned before.
Also, can you try the install command in my previous reply? Does it make a difference with the inconsistency in results you were seeing?
@yahskapar,
print(torch.version.cuda) 11.3
I think the default settings can run directly on the RTX 3090 or RTX A6000. I have conducted experiments on both the 3090 and A6000, and randomness exists in both. Sometimes, there are differences in the results between them. For further validation, I'll try after completing the current tasks.
@yahskapar,
print(torch.version.cuda) 11.3
I think the default settings can run directly on the RTX 3090 or RTX A6000. I have conducted experiments on both the 3090 and A6000, and randomness exists in both. Sometimes, there are differences in the results between them. For further validation, I'll try after completing the current tasks.
Thanks - that's interesting. Later when you get a chance to retry with the same PyTorch installation command (basically bumping the PyTorch version, and downgrading to CUDA 11.1), I'm curious if you'd still notice a discrepancy. I'm not denying that randomness exists with model implementations that utilize layers that have statistics, such as batch norm layers, but I wouldn't expect there to be a significant difference between our training results if we have the exact same GPU and the exact same PyTorch version with the exact same version of CUDA compiled.
Can you also quickly confirm that your dataset lengths (for example with training and validating with PURE, testing on UBFC-rPPG) match this reply earlier in this thread?
@yahskapar
The CHUNK_LENGTH is 160
Train Preprocessed Dataset Length: 584 valid Preprocessed Dataset Length: 154
Training and validation on PURE, which contains 60 videos. But in my dataset, there are actually 58. The 6-2 is missing, and I'm not sure if my data is missing or if it wasn't there in the first place. And in the process of data preprocessing, 7-5 processing errors. But the absence of two should not make a big difference to the results.
test Preprocessed Dataset Length: 483
Testing on UBFC, which contains 42 videos.
@zizheng-guo,
Your UBFC-rPPG test set length looks fine. PURE should have 59 videos total (unless that changed since I last re-downloaded it, which was only a few months ago), here's what I have when I run ls
in my PURE dataset directory:
01-01 02-01 03-01 04-01 05-01 06-01 07-02 08-02 09-02 10-02
01-02 02-02 03-02 04-02 05-02 06-03 07-03 08-03 09-03 10-03
01-03 02-03 03-03 04-03 05-03 06-04 07-04 08-04 09-04 10-04
01-04 02-04 03-04 04-04 05-04 06-05 07-05 08-05 09-05 10-05
01-05 02-05 03-05 04-05 05-05 06-06 07-06 08-06 09-06 10-06
01-06 02-06 03-06 04-06 05-06 07-01 08-01 09-01 10-01
What processing errors do you get when processing 07-05? In the past, users who were trying to reproduce training or testing results re-downloaded a few files after noticing preprocessing errors (e.g., buffer size errors from OpenCV) and that helped fix problems that they were seeing (an example a past user of the toolbox encountered with UBFC-rPPG can be found here). In my opinion, it's worth investigating on your end.
@yahskapar,
Thanks for your time, this indeed might lead to some underlying issues. I re-downloaded 07-05 and resolved the problem. I re-ran the experiment, and the results are similar to those obtained previously.
Just to clarify, similar to the results you obtained previously with a seed of 42 or the toolbox default of 100? Also, with the toolbox default of 100, have you tried the PyTorch version compiled with CUDA mentioned here?
Sorry to ask you to check all of these things, it's difficult sometimes to reproduce these kinds of discrepancies and address them. Depending on whether or not I can reproduce a poor training result with a different version of PyTorch and CUDA, I may play around with some BatchNorm alternatives (e.g., GroupNorm) later to benchmark their effect and a few other things.
When I was doing model training, I noticed that as long as my parameters remain the same, then the output of the model is the same every time, is there any parameter to set?
Hi @lishubing17,
There are certain parameters exposed via the config file, for example here. There are also additional parameters you can experiment with both here and here. In a future toolbox update, these additional parameters related to the loss calculations will be better documented and exposed via the config files.
If you have any further questions, please create a new issue.
The model training configuration is the same in both cases, only the saving path of the model is modified, so why is the accuracy of each one the same for the test set, the figure gives information about my configuration file ![Uploading 1.jpg…]()
Hi @lishubing17,
Can you please make a new issue regarding this rather than re-using this issue? Please also add a complete log of the training (e.g., what's written to the terminal). Happy to help you further in a new issue.
I'm going to go ahead and close this issue since it's gone stale at this point (lack of responses from the original poster) and may be reopened later if needed.
I used the PURE_PURE_UBFC-rPPG_PHYSFORMER_BASIC.yaml file in the train_config folder to train physformer on PURE and test it on UBFC-rPPG, but I found differences between the results and those reported in the table. Is this normal, and do I need to change some hyperparameters? The results of the pre-trained model: PURE_PhysFormer_DiffNormalized_UBFC-rPPG_FFT_BlandAltman_ScatterPlot.pdf
The results of my own training model: PURE_PURE_UBFC-rPPG_physformer_Epoch7_UBFC-rPPG_FFT_BlandAltman_ScatterPlot.pdf
My training loss (I trained twice, and in the second attempt, I changed the epoch to 50. However, I used the model trained with 7 epochs for testing in both attempts, and the results of the two tests were the same.): PURE_PURE_UBFC-rPPG_physformer_losses.pdf
Using the given pre-trained model indeed yields the same metrics as those in the table (MAE: 1.44/MAPE: 1.66). However, when training the model using PURE_PURE_UBFC-rPPG_PHYSFORMER_BASIC.yaml as the configuration, both metrics are around 7.