Model performance for training on MMPD and test on UBFC-rPPG

Dylan-H-Wang commented 1 year ago

Hi,

I am trying to benchmark model performance by training on MMPD (8/2 train/val) and testing on UBFC-rPPG.

The data preprocessing configs were copied from MMPD_MMPD_UBFC-rPPG_TSCAN_BASIC.yaml. There is also an issue discussing this. Here are what I have got: DeepPhys

FFT MAE (FFT Label): 33.042689732142854 +/- 3.3951087265953634
FFT RMSE (FFT Label): 39.69815363315562 +/- 245.44680450227932
FFT MAPE (FFT Label): 31.14947390509345 +/- 2.8486228115261834
FFT Pearson (FFT Label): 0.19093561884305826 +/- 0.15520499262725235
FFT SNR (FFT Label): -16.646793095303867 +/- 1.3418011428066896 (dB)

TSCAN

FFT MAE (FFT Label): 7.972935267857143 +/- 2.1494911616969685
FFT RMSE (FFT Label): 16.050570440404986 +/- 101.9031492282661
FFT MAPE (FFT Label): 7.6169276333449565 +/- 1.9697403623439185
 # FFT Pearson (FFT Label): 0.7157262832331542 +/- 0.1104237165976194
FFT SNR (FFT Label): -7.265608650048974 +/- 1.0412795916982052 (dB)

EfficientNet

FFT MAE (FFT Label): 15.108816964285714 +/- 3.2712410858391556
FFT RMSE (FFT Label): 26.03303893537859 +/- 196.1100413430799
FFT MAPE (FFT Label): 14.1879931074378 +/- 2.8732807713372015
FFT Pearson (FFT Label): 0.36499332109478055 +/- 0.14720562791179234
FFT SNR (FFT Label): -5.967593207008297 +/- 1.209901238944117 (dB)

PhysNet

FFT MAE (FFT Label): 13.204520089285714 +/- 2.6553352833890442
FFT RMSE (FFT Label): 21.69085476442094 +/- 143.0005211625187
FFT MAPE (FFT Label): 12.306415758472724 +/- 2.3032641508031153
FFT Pearson (FFT Label): 0.5477730857268076 +/- 0.13228233504075018
FFT SNR (FFT Label): -6.493590663754255 +/- 1.2960373232025066 (dB)

PhysFormer

FFT MAE (FFT Label): 19.105747767857142 +/- 3.4572262602235915
FFT RMSE (FFT Label): 29.445389472257304 +/- 211.5766061231184
FFT MAPE (FFT Label): 17.268273316617634 +/- 2.894276222252997
FFT Pearson (FFT Label): 0.14091911652017647 +/- 0.15653608231005184
FFT SNR (FFT Label): -9.613906642750075 +/- 1.1943880412838677 (dB)

Although my TSCAN performs better than the mentioned issue, seems like the results are varying a lot among methods, and the best performing method TSCAN still cannot achieve reasonable results compared with training on PURE achieving 1.30 ± 0.40. Any thoughts on this training?

yahskapar commented 1 year ago

In addition to the discussion that you already saw on #149, @McJackTang (who is the lead author for that paper and the corresponding dataset) may be able to offer some further observations from his own experience with using MMPD data for training.

From my own experience, I will say a variety of factors could be at play and you may need to take multiple steps to investigate including, but not limited to:

1) Exploring the MMPD data after pre-processing. In addition to writing your own code to potentially do this in different ways, you can leverage notebooks described here for some basic visualization.

2) Trying different parameters through the config and/or the corresponding trainer file. This includes things such as the learning rate, batch size (if for whatever reason it might need to be smaller), or especially for more complicated approaches such as PhysFormer playing with the learning rate as well as constants defined within the trainer file (and currently left so that they cannot be configured via the config file):

https://github.com/ubicomplab/rPPG-Toolbox/blob/53b84584c2501f40ac925e141e7b908d1013d002/neural_methods/trainer/PhysFormerTrainer.py#L82-L86

It's unreasonable to assume a single hyperparameter strategy could work for given neural methods across all datasets, so I strongly suggest digging into that a bit and maybe even considering using tools such as LR Finder, or newer tools if you can find any.

3) Ensuring you can get reasonable results with intra-dataset training. I was able to get fairly reasonable results, at least with TS-CAN, using a variety of intra-dataset configurations (e.g., training on certain skin types with augmentation, and testing on other skin types without any augmentation). I recommend spending some time to explore this and making sure it can work out before moving on to cross-dataset experiments, especially if you have custom code of any kind on your fork of this toolbox or if you plan on eventually writing your own neural method for training and testing.

It's very possible in the case of MMPD in particular that differences in the camera domain, which should differ significantly given how it was captured using mobile phones, and more unconstrained conditions (e.g., dynamic lighting) make it so that training with MMPD and testing on other datasets is very difficult (especially with existing methods such as the ones you've tried so far).

McJackTang commented 1 year ago

I agree with @yahskapar. MMPD is captured by mobile phone and stored as mp4 files, so there would be compression. And its scenarios are more complex than other datasets, which makes it harder to train. To train MMPD, the existing methods with default parameters may not work. But there are some suggestions that may help you deal with this kind of dataset:

Do not use all of the dataset at first, try some simple subsets.
Try some data augment or transformation.
Adjust the hyperparameters and models.
Train it on pre-trained models

Dylan-H-Wang commented 1 year ago

Thank you for the suggestions. I will try them later.

Dylan-H-Wang commented 1 year ago

@yahskapar @McJackTang Follow up updates: Same config as MMPD_MMPD_UBFC-rPPG_TSCAN_BASIC.yaml, but use 0.7 as training, 0.1 as validation and 0.2 as testing. All models are using the default parameters. DeepPhys

FFT MAE (FFT Label): 10.1806640625 +/- 4.20037187224269
FFT RMSE (FFT Label): 17.75847423443269 +/- 186.58567651109223
FFT MAPE (FFT Label): 13.233947033501261 +/- 4.988052703224493
FFT Pearson (FFT Label): 0.45085506380831686 +/- 0.28226400965025605
FFT SNR (FFT Label): -4.300115732908986 +/- 1.9959056232615164 (dB)

TSCAN

FFT MAE (FFT Label): 1.2451171875 +/- 0.5621864716635476
FFT RMSE (FFT Label): 2.3114844489344972 +/- 3.208926531581611
FFT MAPE (FFT Label): 1.568817067665263 +/- 0.6628753911141089
FFT Pearson (FFT Label): 0.9798363084967618 +/- 0.06318291584869959
FFT SNR (FFT Label): 3.4315908352615825 +/- 1.7121889675228932 (dB)

EfficientPhys

FFT MAE (FFT Label): 1.3916015625 +/- 0.5621864716635476
FFT RMSE (FFT Label): 2.393574409917216 +/- 3.0770818055582807
FFT MAPE (FFT Label): 1.7732935100200482 +/- 0.6683177203215604
FFT Pearson (FFT Label): 0.9791089149357152 +/- 0.06430064750327673
FFT SNR (FFT Label): 2.003380704462375 +/- 1.7828006798505074 (dB)

PhysNet

FFT MAE (FFT Label): 1.7578125 +/- 0.7540744732168213
FFT RMSE (FFT Label): 3.1485622939752944 +/- 6.107525979475464
FFT MAPE (FFT Label): 2.2051470525189965 +/- 0.8661281301064799
FFT Pearson (FFT Label): 0.9620966777025217 +/- 0.08623803265020025
FFT SNR (FFT Label): 8.538906695569723 +/- 2.6536867023854978 (dB)

PhysFormer

FFT MAE (FFT Label): 1.611328125 +/- 0.6286324799601432
FFT RMSE (FFT Label): 2.708970997398476 +/- 3.506409487225539
FFT MAPE (FFT Label): 2.124674220855931 +/- 0.7991948429884765
FFT Pearson (FFT Label): 0.9661003690958244 +/- 0.08163949830376947
FFT SNR (FFT Label): -1.4937987775932398 +/- 1.5266972306156454 (dB)

It seems like pre-training on MMPD is hard to generalise to UBFC-rPPG if we only used the default hyperparameters.

ubicomplab / rPPG-Toolbox

Model performance for training on MMPD and test on UBFC-rPPG #211