ubicomplab / rPPG-Toolbox

rPPG-Toolbox: Deep Remote PPG Toolbox (NeurIPS 2023)
506 stars 129 forks source link

physformer trained on PURE and tested on UBFC-rPPG #229

Closed Arktis2022 closed 12 months ago

Arktis2022 commented 1 year ago

I used the PURE_PURE_UBFC-rPPG_PHYSFORMER_BASIC.yaml file in the train_config folder to train physformer on PURE and test it on UBFC-rPPG, but I found differences between the results and those reported in the table. Is this normal, and do I need to change some hyperparameters? The results of the pre-trained model: PURE_PhysFormer_DiffNormalized_UBFC-rPPG_FFT_BlandAltman_ScatterPlot.pdf

The results of my own training model: PURE_PURE_UBFC-rPPG_physformer_Epoch7_UBFC-rPPG_FFT_BlandAltman_ScatterPlot.pdf

My training loss (I trained twice, and in the second attempt, I changed the epoch to 50. However, I used the model trained with 7 epochs for testing in both attempts, and the results of the two tests were the same.): PURE_PURE_UBFC-rPPG_physformer_losses.pdf

Using the given pre-trained model indeed yields the same metrics as those in the table (MAE: 1.44/MAPE: 1.66). However, when training the model using PURE_PURE_UBFC-rPPG_PHYSFORMER_BASIC.yaml as the configuration, both metrics are around 7.

yahskapar commented 1 year ago

Hi @Arktis2022,

I just tried reproducing the result and was able to get an MAE of 1.76 and a MAPE of 1.96 just now using an NVIDIA RTX 6000. Unfortunately, I don't currently have access to an NVIDIA RTX 4500, which was the GPU I used to train PhysFormer for this toolbox (i.e., the pre-trained model you mentioned). Regardless, what you're getting (MAE around 7) sounds way off, so let's troubleshoot this:

  1. What are your preprocessed dataset lengths, as reported by the toolbox when you run main.py? Can you confirm your dataset lengths match mine below:
Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/train/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNorm

File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/train/DataFileLists/PURE_SizeW128_SizeH128_ClipLength160_Data
 train Preprocessed Dataset Length: 596

Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/val/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormal

File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/val/DataFileLists/PURE_SizeW128_SizeH128_ClipLength160_DataTy
 valid Preprocessed Dataset Length: 154

Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/test/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiff

File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/test/DataFileLists/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_
 test Preprocessed Dataset Length: 483

If they don't match, let's troubleshoot the datasets you downloaded further. I'd also be curious if they somehow matched, but you also got strange results when training with PURE and testing on UBFC-rPPG with another architecture (e.g., TS-CAN).

  1. Can you share more details regarding your GPU, your version of PyTorch, and what version of CUDA you compiled PyTorch with? This will vary from toolbox user to toolbox user depending on what type of GPUs they have access to. In my case, since I mainly use RTX A4500s and RTX A6000s, I used torch 1.8.2 with cu111 (CUDA 11.1).

  2. Are you using the default config settings in PURE_PURE_UBFC-rPPG_PHYSFORMER_BASIC.yaml without changing any of the hyperparameters, and without changing other aspects of the toolbox (e.g., dataloading)? The train config as is should produce a result reasonably close to the pre-trained model result that you were able to verify.

My guess is that there is some issue with the datasets you downloaded (e.g., incomplete download, corrupted download, etc) as that is an issue I've seen quite a few times in the past. Also, if it ends up not being the datasets you downloaded having some kind of issue or some difference in your version of the code (e.g., a branch on a fork) that wasn't accounted for, I'd also appreciate it if you shared with us your complete terminal output when training.

yahskapar commented 1 year ago

For reference, here is my my terminal output (following the config echo) when training:

Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/train/PURE_SizeW128_SizeH128_ClipLength160_DataTy[543/1681]

File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/train/DataFileLists/PURE_SizeW128_SizeH128_ClipLength160_Data
 train Preprocessed Dataset Length: 596                                                                                                 

Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/val/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormal

File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/val/DataFileLists/PURE_SizeW128_SizeH128_ClipLength160_DataTy
 valid Preprocessed Dataset Length: 154                                                                                                 

Preprocessing dataset...                                                                                                                
   | 0/42 [00:00<?, ?it/s]Warning: More than one faces are detected(Only cropping the biggest one.)                                     
  5%|██████                                                                                                                        | 2/4
2 [00:42<13:45, 20.63s/it]Warning: More than one faces are detected(Only cropping the biggest one.)                                     
 45%|████████████████████████████████████████████████████████▌                                                                    | 19/4
2 [01:34<02:08,  5.60s/it]Warning: More than one faces are detected(Only cropping the biggest one.)                                     
 50%|██████████████████████████████████████████████████████████████▌                                                              | 21/4
2 [01:36<01:15,  3.60s/it]Warning: More than one faces are detected(Only cropping the biggest one.)                                     
 64%|████████████████████████████████████████████████████████████████████████████████▎                                            | 27/4
2 [01:52<00:54,  3.61s/it]Warning: More than one faces are detected(Only cropping the biggest one.)                                     
 71%|█████████████████████████████████████████████████████████████████████████████████████████▎                                   | 30/4
2 [02:00<00:28,  2.37s/it]Warning: More than one faces are detected(Only cropping the biggest one.)                                     
 83%|████████████████████████████████████████████████████████████████████████████████████████████████████████▏                    | 35/4
2 [02:22<00:29,  4.18s/it]Warning: More than one faces are detected(Only cropping the biggest one.)                                     
Warning: More than one faces are detected(Only cropping the biggest one.)                                                               
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/4
2 [02:52<00:00,  4.12s/it]                                                                                                              
Total Number of raw files preprocessed: 42                                                                                              

Cached Data Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/test/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiff

File List Path /playpen-nas-ssd/akshay/UNC_Google_Physio/preprocessed_gold/test/DataFileLists/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_
 test Preprocessed Dataset Length: 483 
====Training Epoch: 0====                                                                                                               
 66%|███████████████████████████▉              | 99/149 [00:50<00:23,  2.11it/s]                                                        
epoch:0, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.8178,                                                       
b:1.000, kl:3.511, fre_CEloss:4.932, hr_mae:31.735                                                                                      
100%|█████████████████████████████████████████| 149/149 [01:14<00:00,  2.01it/s]                                                        
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru

100%|███████████████████████████████████████████| 39/39 [00:07<00:00,  5.02it/s]                                                        
Validation RMSE:37.175, batch:149                                                                                                       
Update best model! Best epoch: 0                                                                                                        

====Training Epoch: 1====                                                                                                               
 66%|███████████████████████████▉              | 99/149 [00:49<00:24,  2.06it/s]                                                        
epoch:1, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.4911,                                                       
b:1.000, kl:3.496, fre_CEloss:4.922, hr_mae:21.746                                                                                      
100%|█████████████████████████████████████████| 149/149 [01:13<00:00,  2.02it/s]                                                        
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.11it/s]                                                        
Validation RMSE:37.318, batch:149                                                                                                       

====Training Epoch: 2====                                                                                                               
 66%|███████████████████████████▉              | 99/149 [00:49<00:24,  2.08it/s]                                                        
epoch:2, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.4048,                                                       
b:1.000, kl:3.490, fre_CEloss:4.919, hr_mae:19.231                                                                                      
100%|█████████████████████████████████████████| 149/149 [01:13<00:00,  2.03it/s]                                                        
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.11it/s]                                                        
Validation RMSE:34.148, batch:149                                                                                                       
Update best model! Best epoch: 2                                                                                                        

====Training Epoch: 3====                                                                                                               
 66%|███████████████████████████▉              | 99/149 [00:49<00:23,  2.09it/s]                                                        
epoch:3, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.3032,                                                       
b:1.000, kl:3.488, fre_CEloss:4.916, hr_mae:17.234                                                                                      
100%|█████████████████████████████████████████| 149/149 [01:13<00:00,  2.03it/s]                                                        
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  5.92it/s]                                                        
Validation RMSE:32.969, batch:149                                                                                                       
Update best model! Best epoch: 3 
====Training Epoch: 4====                                                                                                               
 66%|███████████████████████████▉              | 99/149 [00:49<00:23,  2.09it/s]                                                        
epoch:4, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.2381,                                                       
b:1.000, kl:3.494, fre_CEloss:4.915, hr_mae:16.749                                                                                      
100%|█████████████████████████████████████████| 149/149 [01:13<00:00,  2.03it/s]                                                        
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  5.99it/s]                                                        
Validation RMSE:33.763, batch:149                                                                                                       

====Training Epoch: 5====                                                                                                               
 66%|███████████████████████████▉              | 99/149 [00:49<00:24,  2.08it/s]                                                        
epoch:5, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.1616,                                                       
b:1.000, kl:3.489, fre_CEloss:4.914, hr_mae:11.267                                                                                      
100%|█████████████████████████████████████████| 149/149 [01:13<00:00,  2.03it/s]                                                        
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.30it/s]                                                        
Validation RMSE:30.616, batch:149                                                                                                       
Update best model! Best epoch: 5                                                                                                        

====Training Epoch: 6====                                                                                                               
 66%|███████████████████████████▉              | 99/149 [00:48<00:23,  2.10it/s]                                                        
epoch:6, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.1287,                                                       
b:1.000, kl:3.485, fre_CEloss:4.913, hr_mae:9.849                                                                                       
100%|█████████████████████████████████████████| 149/149 [01:13<00:00,  2.04it/s]                                                        
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.23it/s]                                                        
Validation RMSE:37.454, batch:149                                                                                                       

====Training Epoch: 7====                                                                                                               
 66%|███████████████████████████▉              | 99/149 [00:49<00:24,  2.05it/s]                                                        
epoch:7, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.1064,                                                       
b:1.000, kl:3.487, fre_CEloss:4.913, hr_mae:11.021                                                                                      
100%|█████████████████████████████████████████| 149/149 [01:13<00:00,  2.02it/s]                                                        
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.30it/s]                                                        
Validation RMSE:43.918, batch:149 
====Training Epoch: 8====                                                                                                               
 66%|███████████████████████████▉              | 99/149 [00:49<00:23,  2.08it/s]                                                        
epoch:8, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.0843,                                                       
b:1.000, kl:3.483, fre_CEloss:4.911, hr_mae:8.060                                                                                       
100%|█████████████████████████████████████████| 149/149 [01:13<00:00,  2.02it/s]                                                        
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  5.77it/s]                                                        
Validation RMSE:39.366, batch:149                                                                                                       

====Training Epoch: 9====                                                                                                               
 66%|███████████████████████████▉              | 99/149 [00:49<00:24,  2.05it/s]                                                        
epoch:9, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.0731,                                                       
b:1.000, kl:3.485, fre_CEloss:4.911, hr_mae:7.898                                                                                       
100%|█████████████████████████████████████████| 149/149 [01:14<00:00,  2.00it/s]                                                        
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTru

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  5.76it/s]                                                        
Validation RMSE:39.605, batch:149                                                                                                       
best trained epoch: 5, min_val_loss: 30.615833555070317                                                                                 
Saving plots of losses and learning rates to: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelType

Testing uses best epoch selected using model selection as non-pretrained model!                                                         

100%|███████████████████████████████████████████| 42/42 [00:32<00:00,  1.27it/s]                                                        
FFT MAE (FFT Label): 1.7578125 +/- 0.5560266569023335                                                                                   
FFT RMSE (FFT Label): 4.009346804526597 +/- 7.956243690132235                                                                           
FFT MAPE (FFT Label): 1.9621455191799881 +/- 0.6267998720798567                                                                         
FFT Pearson (FFT Label): 0.9738538837329217 +/- 0.03591956748706233                                                                     
FFT SNR (FFT Label): -0.36154201443942985 +/- 1.118238865456478 (dB)                                                                    
Saved PURE_PURE_UBFC-rPPG_physformer_FFT_BlandAltman_ScatterPlot.pdf to runs/exp/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiffN
Saved PURE_PURE_UBFC-rPPG_physformer_FFT_BlandAltman_DifferencePlot.pdf to runs/exp/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDi
Arktis2022 commented 1 year ago

Thank you very much for your answer. I found that this issue is caused by the difference of GPU and random seed. I changed che random seed from 100 to 42, and my complete terminal output when training is as follows:

Cached Data Path /data/gscratch/ubicomp/xliu0/data3/mnt/Datasets/rppg_toolbox/PreprocessedData/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse

File List Path /data/gscratch/ubicomp/xliu0/data3/mnt/Datasets/rppg_toolbox/PreprocessedData/DataFileLists/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse_0.0_0.8.csv
 train Preprocessed Dataset Length: 596

Cached Data Path /data/gscratch/ubicomp/xliu0/data3/mnt/Datasets/rppg_toolbox/PreprocessedData/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse

File List Path /data/gscratch/ubicomp/xliu0/data3/mnt/Datasets/rppg_toolbox/PreprocessedData/DataFileLists/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse_0.8_1.0.csv
 valid Preprocessed Dataset Length: 154

Cached Data Path /data/gscratch/ubicomp/xliu0/data3/mnt/Datasets/rppg_toolbox/PreprocessedData/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse

File List Path /data/gscratch/ubicomp/xliu0/data3/mnt/Datasets/rppg_toolbox/PreprocessedData/DataFileLists/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse_0.0_1.0.csv
 test Preprocessed Dataset Length: 483

====Training Epoch: 0====
 66%|███████████████████████████▉              | 99/149 [01:30<00:25,  1.98it/s]
epoch:0, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:1.0003, 
b:1.000, kl:3.513, fre_CEloss:4.939, hr_mae:47.902
100%|█████████████████████████████████████████| 149/149 [02:05<00:00,  1.19it/s]
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch0.pth

100%|███████████████████████████████████████████| 39/39 [00:30<00:00,  1.29it/s]
Validation RMSE:35.772, batch:149
Update best model! Best epoch: 0

====Training Epoch: 1====
 66%|███████████████████████████▉              | 99/149 [00:51<00:24,  2.01it/s]
epoch:1, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.5540, 
b:1.000, kl:3.501, fre_CEloss:4.925, hr_mae:25.440
100%|█████████████████████████████████████████| 149/149 [01:16<00:00,  1.96it/s]
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch1.pth

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.16it/s]
Validation RMSE:39.978, batch:149

====Training Epoch: 2====
 66%|███████████████████████████▉              | 99/149 [00:50<00:24,  2.01it/s]
epoch:2, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.4140, 
b:1.000, kl:3.489, fre_CEloss:4.919, hr_mae:20.245
100%|█████████████████████████████████████████| 149/149 [01:15<00:00,  1.97it/s]
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch2.pth

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.22it/s]
Validation RMSE:35.527, batch:149
Update best model! Best epoch: 2

====Training Epoch: 3====
 66%|███████████████████████████▉              | 99/149 [00:50<00:24,  2.02it/s]
epoch:3, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.2948, 
b:1.000, kl:3.491, fre_CEloss:4.916, hr_mae:17.286
100%|█████████████████████████████████████████| 149/149 [01:15<00:00,  1.96it/s]
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch3.pth

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.21it/s]
Validation RMSE:35.600, batch:149

====Training Epoch: 4====
 66%|███████████████████████████▉              | 99/149 [00:50<00:25,  2.00it/s]
epoch:4, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.2132, 
b:1.000, kl:3.490, fre_CEloss:4.916, hr_mae:15.157
100%|█████████████████████████████████████████| 149/149 [01:15<00:00,  1.96it/s]
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch4.pth

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.21it/s]
Validation RMSE:34.885, batch:149
Update best model! Best epoch: 4

====Training Epoch: 5====
 66%|███████████████████████████▉              | 99/149 [00:50<00:24,  2.01it/s]
epoch:5, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.1403, 
b:1.000, kl:3.484, fre_CEloss:4.914, hr_mae:12.155
100%|█████████████████████████████████████████| 149/149 [01:15<00:00,  1.97it/s]
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch5.pth

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.22it/s]
Validation RMSE:38.385, batch:149

====Training Epoch: 6====
 66%|███████████████████████████▉              | 99/149 [00:50<00:24,  2.03it/s]
epoch:6, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.1116, 
b:1.000, kl:3.489, fre_CEloss:4.913, hr_mae:7.888
100%|█████████████████████████████████████████| 149/149 [01:15<00:00,  1.97it/s]
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch6.pth

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.30it/s]
Validation RMSE:31.716, batch:149
Update best model! Best epoch: 6

====Training Epoch: 7====
 66%|███████████████████████████▉              | 99/149 [00:51<00:25,  1.98it/s]
epoch:7, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.0831, 
b:1.000, kl:3.486, fre_CEloss:4.911, hr_mae:6.334
100%|█████████████████████████████████████████| 149/149 [01:16<00:00,  1.96it/s]
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch7.pth

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.11it/s]
Validation RMSE:40.648, batch:149

====Training Epoch: 8====
 66%|███████████████████████████▉              | 99/149 [00:50<00:24,  2.02it/s]
epoch:8, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.0785, 
b:1.000, kl:3.483, fre_CEloss:4.911, hr_mae:7.084
100%|█████████████████████████████████████████| 149/149 [01:15<00:00,  1.98it/s]
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch8.pth

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.20it/s]
Validation RMSE:38.094, batch:149

====Training Epoch: 9====
 66%|███████████████████████████▉              | 99/149 [00:51<00:24,  2.02it/s]
epoch:9, batch:100, total:37, lr:0.0001, sharp:2.000, a:1.000, NegPearson:0.0653, 
b:1.000, kl:3.485, fre_CEloss:4.910, hr_mae:5.987
100%|█████████████████████████████████████████| 149/149 [01:15<00:00,  1.96it/s]
Saved Model Path:  runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/PreTrainedModels/PURE_PURE_UBFC-rPPG_physformer_Epoch9.pth

100%|███████████████████████████████████████████| 39/39 [00:06<00:00,  6.23it/s]
Validation RMSE:22.860, batch:149
Update best model! Best epoch: 9
best trained epoch: 9, min_val_loss: 22.8599205971596
Saving plots of losses and learning rates to: runs/exp/PURE_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/plots

Testing uses best epoch selected using model selection as non-pretrained model!

100%|███████████████████████████████████████████| 42/42 [28:59<00:00, 41.43s/it]
FFT MAE (FFT Label): 1.9461495535714286 +/- 0.6956462328481609
FFT RMSE (FFT Label): 4.910426936474206 +/- 14.743013450842696
FFT MAPE (FFT Label): 2.0009059329411953 +/- 0.6795119697286518
FFT Pearson (FFT Label): 0.9627717354762423 +/- 0.04274066721755305
FFT SNR (FFT Label): -0.8674582305770593 +/- 1.1682313063007959 (dB)
Saved PURE_PURE_UBFC-rPPG_physformer_FFT_BlandAltman_ScatterPlot.pdf to runs/exp/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/bland_altman_plots.
Saved PURE_PURE_UBFC-rPPG_physformer_FFT_BlandAltman_DifferencePlot.pdf to runs/exp/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/bland_altman_plots.
Saving outputs to: runs/exp/UBFC-rPPG_SizeW128_SizeH128_ClipLength160_DataTypeDiffNormalized_DataAugNone_LabelTypeDiffNormalized_Crop_faceTrue_Large_boxTrue_Large_size1.5_Dyamic_DetFalse_det_len30_Median_face_boxFalse/saved_test_outputs/PURE_PURE_UBFC-rPPG_physformer_outputs.pickle
Arktis2022 commented 1 year ago

Perhaps it's due to the insufficient robustness of this model that the impact of random numbers is too significant?

yahskapar commented 1 year ago

Those results seem closer, but I still feel something is off here - can you share more details that I requested in my previous reply (2 and 3, regarding your GPU, torch version, and config details)? Depending on what specific GPU you have, we can maybe try to reproduce the previous result you got

The seed setting being changed definitely will make a difference, but it's been set to 100 for a while now (maybe an year or so at this point) and I don't believe other folks (with different GPUs than the ones I mentioned) have noticed that big of a difference with the setting of 100 (e.g., MAE of 7 versus 1.7 or so).

Few more things in addition to the extra information I requested:

zizheng-guo commented 1 year ago

I've found the impact of the random seed to be substantial for Physformer, occasionally resulting in notably poor outcomes. Additionally, may I ask a question? Where can I find the results for Physformer, such as the mentioned MAE of 1.44 in the table?

yahskapar commented 1 year ago


The paper version being referred to (which is to be published in NeurIPS 2023, and will also eventually be updated on arXiv) with those results can be found here on OpenReview: https://openreview.net/pdf?id=q4XNX15kSe. Check out Table 2 in that PDF.

I'm also curious about your GPU, torch version, and config details if you can share that information. It's possible there is some instability introduced by some layer, maybe the BatchNorm layers, that are present in the PhysFormer model but not present in other models (e.g., TS-CAN). I'd like to better investigate this and narrow this down with different GPUs and PyTorch versions accounted for at the very least, however. I've yet to be able to produce a poorer training result with the default config and while training on a single RTX A4500, a single RTX A6000, or a single RTX 2070 Super (my personal computer).

zizheng-guo commented 1 year ago

@yahskapar Thanks for your response. I used the default torch version in the toolbox, and the GPU is a RTX A6000. In my experiments, this significant difference only occurred in cross-dataset testing between UBFC and PURE. I believe this is due to the larger distribution gap between them and the smaller data size, which results in greater random effects.

yahskapar commented 1 year ago


Are you sure you're using the default torch version in the toolbox? I didn't think this was possible with RTX A6000s since my understanding was that it requires sm_86 compatibility that isn't supported by the toolbox defaults. Can you check using pip list or conda list and share that information here?

You can try your experiments again using the exact same (Linux) install command that I used with my RTX A6000:

pip install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111

Your hypothesis involving greater random effects could be true, but I've also noticed some significant differences when batch norm layers are involved with certain versions of PyTorch and CUDA compilations that are compatible with GPUs like the RTX A4500 or the A6000. It would be good to better understand the impact of a different PyTorch version.

zizheng-guo commented 1 year ago


I'm a bit confused. I'm certain I'm using the default Torch version 1.12.1, which should support sm_86.

print(torch.version) 1.12.1 torch.cuda.get_arch_list() ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37']

yahskapar commented 1 year ago


What version of CUDA are you using? Can you check using print(torch.version.cuda)? Maybe I misremember the exact details of why I had to use the install command I mentioned before.

Also, can you try the install command in my previous reply? Does it make a difference with the inconsistency in results you were seeing?

zizheng-guo commented 1 year ago


print(torch.version.cuda) 11.3

I think the default settings can run directly on the RTX 3090 or RTX A6000. I have conducted experiments on both the 3090 and A6000, and randomness exists in both. Sometimes, there are differences in the results between them. For further validation, I'll try after completing the current tasks.

yahskapar commented 1 year ago


print(torch.version.cuda) 11.3

I think the default settings can run directly on the RTX 3090 or RTX A6000. I have conducted experiments on both the 3090 and A6000, and randomness exists in both. Sometimes, there are differences in the results between them. For further validation, I'll try after completing the current tasks.

Thanks - that's interesting. Later when you get a chance to retry with the same PyTorch installation command (basically bumping the PyTorch version, and downgrading to CUDA 11.1), I'm curious if you'd still notice a discrepancy. I'm not denying that randomness exists with model implementations that utilize layers that have statistics, such as batch norm layers, but I wouldn't expect there to be a significant difference between our training results if we have the exact same GPU and the exact same PyTorch version with the exact same version of CUDA compiled.

Can you also quickly confirm that your dataset lengths (for example with training and validating with PURE, testing on UBFC-rPPG) match this reply earlier in this thread?

zizheng-guo commented 1 year ago



Train Preprocessed Dataset Length: 584 valid Preprocessed Dataset Length: 154

Training and validation on PURE, which contains 60 videos. But in my dataset, there are actually 58. The 6-2 is missing, and I'm not sure if my data is missing or if it wasn't there in the first place. And in the process of data preprocessing, 7-5 processing errors. But the absence of two should not make a big difference to the results.

test Preprocessed Dataset Length: 483

Testing on UBFC, which contains 42 videos.

yahskapar commented 1 year ago


Your UBFC-rPPG test set length looks fine. PURE should have 59 videos total (unless that changed since I last re-downloaded it, which was only a few months ago), here's what I have when I run ls in my PURE dataset directory:

01-01  02-01  03-01  04-01  05-01  06-01  07-02  08-02  09-02  10-02
01-02  02-02  03-02  04-02  05-02  06-03  07-03  08-03  09-03  10-03
01-03  02-03  03-03  04-03  05-03  06-04  07-04  08-04  09-04  10-04
01-04  02-04  03-04  04-04  05-04  06-05  07-05  08-05  09-05  10-05
01-05  02-05  03-05  04-05  05-05  06-06  07-06  08-06  09-06  10-06
01-06  02-06  03-06  04-06  05-06  07-01  08-01  09-01  10-01

What processing errors do you get when processing 07-05? In the past, users who were trying to reproduce training or testing results re-downloaded a few files after noticing preprocessing errors (e.g., buffer size errors from OpenCV) and that helped fix problems that they were seeing (an example a past user of the toolbox encountered with UBFC-rPPG can be found here). In my opinion, it's worth investigating on your end.

zizheng-guo commented 1 year ago


Thanks for your time, this indeed might lead to some underlying issues. I re-downloaded 07-05 and resolved the problem. I re-ran the experiment, and the results are similar to those obtained previously.

yahskapar commented 1 year ago

Just to clarify, similar to the results you obtained previously with a seed of 42 or the toolbox default of 100? Also, with the toolbox default of 100, have you tried the PyTorch version compiled with CUDA mentioned here?

Sorry to ask you to check all of these things, it's difficult sometimes to reproduce these kinds of discrepancies and address them. Depending on whether or not I can reproduce a poor training result with a different version of PyTorch and CUDA, I may play around with some BatchNorm alternatives (e.g., GroupNorm) later to benchmark their effect and a few other things.

lishubing17 commented 12 months ago

When I was doing model training, I noticed that as long as my parameters remain the same, then the output of the model is the same every time, is there any parameter to set?

yahskapar commented 12 months ago

Hi @lishubing17,

There are certain parameters exposed via the config file, for example here. There are also additional parameters you can experiment with both here and here. In a future toolbox update, these additional parameters related to the loss calculations will be better documented and exposed via the config files.

If you have any further questions, please create a new issue.

lishubing17 commented 12 months ago

The model training configuration is the same in both cases, only the saving path of the model is modified, so why is the accuracy of each one the same for the test set, the figure gives information about my configuration file ![Uploading 1.jpg…]()

lishubing17 commented 12 months ago

1 2 3

yahskapar commented 12 months ago

Hi @lishubing17,

Can you please make a new issue regarding this rather than re-using this issue? Please also add a complete log of the training (e.g., what's written to the terminal). Happy to help you further in a new issue.

I'm going to go ahead and close this issue since it's gone stale at this point (lack of responses from the original poster) and may be reopened later if needed.