Seeking reasons for discrepancies between experimental results and official records

LL-Zs commented 4 months ago

Hello, we used the officially released source code and yaml file, but the result data from the training is the same, the results from changing different learning rates are still the same, and the results we finally get are somewhat different from the officially released results. What is the reason? The following are the results we obtained using official source code experiments: UBFC-PURE. yaml, 4.96 (official 3.69), 4.79 (official 3.38). UBFC-MMPD, 1.54 (official 1.46), 2.17 (official 2.04). PURE-MMPD 2.07 (official 1.78), 2.92 (official 2.47)

yahskapar commented 4 months ago

Hi @LL-Zs,

Can you elaborate on what exactly you did, and include additional details such as what GPUs you used and the exact .yaml config files you used unless you are certain you didn't change the default settings aside from the learning rate parameter you mentioned you varied? Maybe do this as a numbered or bulleted list to make it a bit easier to understand the whole picture.

It is not trivial to reproduce training results sometimes, especially across different versions of key libraries (e.g., torch), different GPU architectures, and considering there may have been some changes to the repo with various parts of the pipeline (e.g., preprocessing involving face detection) that may differ significantly than the version of the toolbox used to capture the results in the late-2023, NeurIPS version of the rPPG-Toolbox paper. Note that pre-trained models used for the paper can be found here. Hopefully you are not noticing any significant discrepancy with those pre-trained models (if you do, please let us know).

LL-Zs commented 4 months ago

Thank you for your reply. We downloaded three datasets, PURE, UBFC, and MMPD, and directly reproduced them using the official code of Toolbox (both the dataset and code are the latest downloaded, and the code is the end of 2023 version). The reproduction process includes "train and test" and using the PTH file "only test" from the final mode release in the official code. Here is our configuration situation: GPU：NVIDIA GeForce RTX 3090 CPU: Intel Xeon (R) Silver 4210R processor Operating System: Ubuntu 20.04.6 LTS Python version: 3.8.19 Torch version: 2.3.1 CUDA version: 11.8 YAML file and corresponding results: PURE-PURE-UBFC rppg TSCAN - "train and test" -1.44, 1.63- final pth "only test" -1.29, 1.50. UBFC-rppg-UBFC-rppg-PURE-TSCAN——"train and test"-4.97,4.80——final pth "only test" -3.65,3.32. In the MMPD dataset test, the results obtained were also slightly higher than those of the official strategy. During the replication process, the code was downloaded directly without changing any other configurations.

yahskapar commented 4 months ago

Hi @LL-Zs,

For the only_test results that use the pre-trained models, these results seem almost identical. Your own training results don't really seem that crazy either, and could be totally reasonable given the hardware differences. What exactly are you worried about here?

I think for the point of publication, if you find that re-training on different hardware to not match existing benchmark results closely enough, just use your results as the baseline for comparisons. As long as you note the hardware differences (especially the GPU used) and include more robust measurement details such as the standard error, I don't see how this could be a problem.

yahskapar commented 3 months ago

Closing this due to a lack of further discussion, but please feel free to re-open or make a new issue if needed @LL-Zs.

ubicomplab / rPPG-Toolbox

Seeking reasons for discrepancies between experimental results and official records #294