Closed Dylan-H-Wang closed 1 year ago
In addition to the discussion that you already saw on #149, @McJackTang (who is the lead author for that paper and the corresponding dataset) may be able to offer some further observations from his own experience with using MMPD data for training.
From my own experience, I will say a variety of factors could be at play and you may need to take multiple steps to investigate including, but not limited to:
1) Exploring the MMPD data after pre-processing. In addition to writing your own code to potentially do this in different ways, you can leverage notebooks described here for some basic visualization.
2) Trying different parameters through the config and/or the corresponding trainer file. This includes things such as the learning rate, batch size (if for whatever reason it might need to be smaller), or especially for more complicated approaches such as PhysFormer playing with the learning rate as well as constants defined within the trainer file (and currently left so that they cannot be configured via the config file):
It's unreasonable to assume a single hyperparameter strategy could work for given neural methods across all datasets, so I strongly suggest digging into that a bit and maybe even considering using tools such as LR Finder, or newer tools if you can find any.
3) Ensuring you can get reasonable results with intra-dataset training. I was able to get fairly reasonable results, at least with TS-CAN, using a variety of intra-dataset configurations (e.g., training on certain skin types with augmentation, and testing on other skin types without any augmentation). I recommend spending some time to explore this and making sure it can work out before moving on to cross-dataset experiments, especially if you have custom code of any kind on your fork of this toolbox or if you plan on eventually writing your own neural method for training and testing.
It's very possible in the case of MMPD in particular that differences in the camera domain, which should differ significantly given how it was captured using mobile phones, and more unconstrained conditions (e.g., dynamic lighting) make it so that training with MMPD and testing on other datasets is very difficult (especially with existing methods such as the ones you've tried so far).
I agree with @yahskapar. MMPD is captured by mobile phone and stored as mp4 files, so there would be compression. And its scenarios are more complex than other datasets, which makes it harder to train. To train MMPD, the existing methods with default parameters may not work. But there are some suggestions that may help you deal with this kind of dataset:
Thank you for the suggestions. I will try them later.
@yahskapar @McJackTang Follow up updates:
Same config as MMPD_MMPD_UBFC-rPPG_TSCAN_BASIC.yaml
, but use 0.7 as training, 0.1 as validation and 0.2 as testing. All models are using the default parameters.
DeepPhys
FFT MAE (FFT Label): 10.1806640625 +/- 4.20037187224269
FFT RMSE (FFT Label): 17.75847423443269 +/- 186.58567651109223
FFT MAPE (FFT Label): 13.233947033501261 +/- 4.988052703224493
FFT Pearson (FFT Label): 0.45085506380831686 +/- 0.28226400965025605
FFT SNR (FFT Label): -4.300115732908986 +/- 1.9959056232615164 (dB)
TSCAN
FFT MAE (FFT Label): 1.2451171875 +/- 0.5621864716635476
FFT RMSE (FFT Label): 2.3114844489344972 +/- 3.208926531581611
FFT MAPE (FFT Label): 1.568817067665263 +/- 0.6628753911141089
FFT Pearson (FFT Label): 0.9798363084967618 +/- 0.06318291584869959
FFT SNR (FFT Label): 3.4315908352615825 +/- 1.7121889675228932 (dB)
EfficientPhys
FFT MAE (FFT Label): 1.3916015625 +/- 0.5621864716635476
FFT RMSE (FFT Label): 2.393574409917216 +/- 3.0770818055582807
FFT MAPE (FFT Label): 1.7732935100200482 +/- 0.6683177203215604
FFT Pearson (FFT Label): 0.9791089149357152 +/- 0.06430064750327673
FFT SNR (FFT Label): 2.003380704462375 +/- 1.7828006798505074 (dB)
PhysNet
FFT MAE (FFT Label): 1.7578125 +/- 0.7540744732168213
FFT RMSE (FFT Label): 3.1485622939752944 +/- 6.107525979475464
FFT MAPE (FFT Label): 2.2051470525189965 +/- 0.8661281301064799
FFT Pearson (FFT Label): 0.9620966777025217 +/- 0.08623803265020025
FFT SNR (FFT Label): 8.538906695569723 +/- 2.6536867023854978 (dB)
PhysFormer
FFT MAE (FFT Label): 1.611328125 +/- 0.6286324799601432
FFT RMSE (FFT Label): 2.708970997398476 +/- 3.506409487225539
FFT MAPE (FFT Label): 2.124674220855931 +/- 0.7991948429884765
FFT Pearson (FFT Label): 0.9661003690958244 +/- 0.08163949830376947
FFT SNR (FFT Label): -1.4937987775932398 +/- 1.5266972306156454 (dB)
It seems like pre-training on MMPD is hard to generalise to UBFC-rPPG if we only used the default hyperparameters.
Hi,
I am trying to benchmark model performance by training on MMPD (8/2 train/val) and testing on UBFC-rPPG.
The data preprocessing configs were copied from
MMPD_MMPD_UBFC-rPPG_TSCAN_BASIC.yaml
. There is also an issue discussing this. Here are what I have got: DeepPhysTSCAN
EfficientNet
PhysNet
PhysFormer
Although my TSCAN performs better than the mentioned issue, seems like the results are varying a lot among methods, and the best performing method TSCAN still cannot achieve reasonable results compared with training on
PURE
achieving1.30 ± 0.40
. Any thoughts on this training?