ubicomplab / rPPG-Toolbox

rPPG-Toolbox: Deep Remote PPG Toolbox (NeurIPS 2023)
https://arxiv.org/abs/2210.00716
Other
442 stars 106 forks source link

I can't reproduce the results #201

Closed hyunduk00 closed 1 year ago

hyunduk00 commented 1 year ago

Dear authors,

Thank you for this toolbox which really helps a lot.

I followed the instructions to setup this toolbox and train the EfficientPhys on PURE dataset and tested on the UBFC-rPPG dataset. I modified ./configs/train_configs/PURE_PURE_UBFC-rPPG_EFFICIENTPHYS.yaml as followings:

However, I feel confused about the final testing results. I got

FFT MAE (FFT Label): 2.720424107142857 +/- 1.447645832849246 FFT RMSE (FFT Label): 9.768275310006729 +/- 86.3767644596274 FFT MAPE (FFT Label): 2.691825865354666 +/- 1.3371532023835029 FFT Pearson (FFT Label): 0.8650852169012216 +/- 0.07931386503931175 FFT SNR (FFT Label): -0.6631826857626655 +/- 1.2506743482611395 (dB)

However, when I inference with ./final_model_release/PURE_EfficientPhys.pth, I got

FFT MAE (FFT Label): 2.0717075892857144 +/- 0.9220998197442088 FFT RMSE (FFT Label): 6.324810795078438 +/- 32.00626918352853 FFT MAPE (FFT Label): 2.096291610476043 +/- 0.873290220181899 FFT Pearson (FFT Label): 0.9387416753364012 +/- 0.054489463886747594 FFT SNR (FFT Label): -0.12161866676788044 +/- 1.1985097990982712 (dB)

Please let me know additional hyperparameter settings or techniques.

Moreover, in your code, you detect face_zone using cv2.CascadeClassifier and only crop the biggest one. However, I think you choose not biggest area but biggest x position.

Please understand that my English is not good enough.

Sincerely,

Hyunduk Kim

yahskapar commented 1 year ago

Hi @hyunduk00,

Assuming you are indeed using this config with only the minor modifications you listed, few things:

1) Are you sure you have all 59 videos (in the case of PURE, folders with frames) from the PURE dataset?

2) When preprocessing the PURE dataset, carefully inspect the terminal output. Do any see strange errors related to reading the video frames show up? This can sometimes happen when the dataset isn't downloaded properly for whatever reason.

3) Similarly, check 1) and 2) for the UBFC-rPPG test dataset you have downloaded. You should have 42 videos for UBFC-rPPG, and no errors or warnings when reading the videos for preprocessing.

4) How long ago did you download the datasets? It's worth re-downloading them if it's been two years or more, especially since I think UBFC-rPPG at least changed one of the subject data folders within the last two years or so from what some collaborators and I have observed.

5) What GPU(s) are you using? A different GPU shouldn't make a particularly significant difference like you observed. If you happen to have one that I have access to, I can try to replicate your numbers.

As for your comment on the face_zone detection in Baseloader.py, that does look a little strange to me and possibly is a bug that hasn't had much of an impact given the nature (e.g., single person in view) of the datasets supported by this toolbox. I will try to address this sometime within the next few weeks as I have a branch that was exploring different face detection methods (e.g., RetinaFace, utilizing SAM for face detection) and included a refactor of the related code.

hyunduk00 commented 1 year ago

Hi yahskapar,

I checked your comments.

  1. I used 80% of PURE to train and 20% of PURE to valid as default config file.
  2. There are no problem with the data preprocessing process.

image

  1. I downloaded PURE and UBFC-rPPG dataset in October, 2020.
  2. I used NVIDIA Titan RTX GPU.

I will train all 59 videos from PURE dataset.

Thank you.

yahskapar commented 1 year ago

@hyunduk00,

I don't think training with all 59 videos from PURE will make sense since the pre-trained model you referenced was also trained with only 80% of PURE. Since it's been a while since you downloaded the datasets, can you please double-check if the latest versions of those datasets contain the exact same files that you have, especially for UBFC-rPPG?

Feel free to also copy/paste here the folders that you have in those downloaded datasets (maybe just do ls in your terminal output and resize the window for a screenshot). I can take a look and compare on my end.

yahskapar commented 1 year ago

@hyunduk00, any chance you had time to double-check your datasets based on my last reply?

hyunduk00 commented 1 year ago

@yahskapar I re-downloaded UBFC-rPPG and PURE dataset. However, I got same results. I think there is difference depending on the development environment.

  1. python=3.8 pytorch=1.12.1 torchvision=0.13.1 torchaudio=0.12.1 cudatoolkit=10.2 FFT MAE (FFT Label): 2.720424107142857 +/- 1.447645832849246 FFT RMSE (FFT Label): 9.768275310006729 +/- 86.3767644596274 FFT MAPE (FFT Label): 2.691825865354666 +/- 1.3371532023835029 FFT Pearson (FFT Label): 0.8650852169012216 +/- 0.07931386503931175 FFT SNR (FFT Label): -0.6631826857626655 +/- 1.2506743482611395 (dB)

  2. python=3.8 pytorch=1.12.1 torchvision=0.13.1 torchaudio=0.12.1 cudatoolkit=11.3 FFT MAE (FFT Label): 2.0298549107142856 +/- 0.7838237396381851 FFT RMSE (FFT Label): 5.470306797704061 +/- 21.075768161863238 FFT MAPE (FFT Label): 2.1343706235483357 +/- 0.772896343960744 FFT Pearson (FFT Label): 0.9553858692175565 +/- 0.04670059980862505 FFT SNR (FFT Label): -0.4414527736394594 +/- 1.214383042646375 (dB)

  3. python=3.8 pytorch=1.12.1 torchvision=0.13.1 torchaudio=0.12.1 cudatoolkit=11.6 FFT MAE (FFT Label): 2.63671875 +/- 1.4489128939548348 FFT RMSE (FFT Label): 9.753200802089625 +/- 86.3839910153912 FFT MAPE (FFT Label): 2.6309874143440215 +/- 1.3385484677372605 FFT Pearson (FFT Label): 0.8674646058425929 +/- 0.07866148320657877 FFT SNR (FFT Label): -0.11324074329803575 +/- 1.2089500355752851 (dB)

  4. python=3.10 pytorch=1.13.1 torchvision=0.14.1 torchaudio=0.13.1 cudatoolkit=11.7 FFT MAE (FFT Label): 2.992466517857143 +/- 1.4669526603537835 FFT RMSE (FFT Label): 9.966782850283026 +/- 86.37027030951445 FFT MAPE (FFT Label): 2.9364924992608707 +/- 1.3498208621088656 FFT Pearson (FFT Label): 0.8591704697824646 +/- 0.08090520747358898 FFT SNR (FFT Label): -0.49342868524013933 +/- 1.180205147625098 (dB)

How is your development environment?

yahskapar commented 1 year ago

Personally, my environment utilizes the following command to install PyTorch and CUDA Toolkit:

pip install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111

Note that this is more specific to the GPUs I have at my disposal than anything to do with the toolbox, however.

That's very strange that the results appear to vary by almost 1.0 depending on the environment, I wonder if this is something specific to EfficientPhys model implementation in the toolbox. Can you try another model (e.g., TS-CAN) and see if such variation is just as noticeable? Also, just to make sure this is completely tied to training from scratch, have you been able to successfully reproduce results using the provided pre-trained models here?

I should note your second set of results with CUDA 11.3 appears to be the closest with the results from the table in the README, so maybe that environment would be the best way forward for you.

hyunduk00 commented 1 year ago

I got following results on your environment with python=3.8 FFT MAE (FFT Label): 2.3018973214285716 +/- 0.9483822417287281 FFT RMSE (FFT Label): 6.563135233126707 +/- 32.15698088928027 FFT MAPE (FFT Label): 2.2859233153583536 +/- 0.8861542786352009 FFT Pearson (FFT Label): 0.932613597697498 +/- 0.05705959108461289 FFT SNR (FFT Label): -0.49330085391064643 +/- 1.1989449779670647 (dB)

As you mention above, second environment(python=3.8 pytorch=1.12.1 torchvision=0.13.1 torchaudio=0.12.1 cudatoolkit=11.3) may be the best way for me. Thank you for your kindly reply.

yahskapar commented 1 year ago

No problem, also keep mind in addition to these environment configurations, different environments have support for different GPUs and the exact GPU in particular will likely make a difference as well. I'm sure there's some other hardware differences we might be able to dig into, but generally I recommend using pre-trained models where possible (especially for comparisons to results from papers).