ubicomplab / rPPG-Toolbox

rPPG-Toolbox: Deep Remote PPG Toolbox (NeurIPS 2023)
https://arxiv.org/abs/2210.00716
Other
402 stars 97 forks source link

Quesetion about PhyNet implementation #207

Closed Dylan-H-Wang closed 9 months ago

Dylan-H-Wang commented 10 months ago

Hi,

I have a question about the training of PhyNet when I was reviewing the code.

From this block: https://github.com/ubicomplab/rPPG-Toolbox/blob/53b84584c2501f40ac925e141e7b908d1013d002/neural_methods/model/PhysNet.py#L118-L122 seems like the output of PhysNet is rPPG signals where each frame has a corresponding signal value.

Also, in this block: https://github.com/ubicomplab/rPPG-Toolbox/blob/53b84584c2501f40ac925e141e7b908d1013d002/neural_methods/trainer/PhysnetTrainer.py#L62-L69 seems like the ground truth is also rPPG signals and the loss function is Neg_Pearson.

This is contradictory to what this repo claimed for the training of PhysNet which should use DiffNormlised inputs and outputs. Compared with DeepPhysTrainer(I believe this is the root of using DiffNormlised), PhysNetTrainer does not use DiffNormlised as inputs and outputs (although the config file for data preprossing specified it), and the loss function is Neg_Pearson which should not be able to used to compare 1st derivative of rPPG signal?

yahskapar commented 10 months ago

Hi @Dylan-H-Wang,

I think you might have some confusion (possibly caused by my reply in #202 that was more specific to models such as EfficientPhys and TS-CAN) regarding the outputs of the model and ultimately what gets used for HR calculation and subsequent calculation of metrics.

Two things:

1) In the context of PhysNet, even though the input happens to be DiffNormalized frames, the output is still a usable rPPG signal which subsequently is used in the Neg_pearson loss calculation after normalization along with the label (which should be noted as also being DiffNormalized as an input). I'm not completely sure by what you mean by "this is the root of using DiffNormalized" with respect to DeepPhys, as the PhysNet config defaults (which you can modify if you desire) provided do make it so our PhysNet implementation ends up using DiffNormalized frames as input (regardless of what DeepPhys does or doesn't do).

2) One other thing that's important is understanding how DiffNormalized inputs are deal with in post-processing in the toolbox.

https://github.com/ubicomplab/rPPG-Toolbox/blob/53b84584c2501f40ac925e141e7b908d1013d002/evaluation/post_process.py#L101-L103

In the above code, predictions and results that are DiffNormalized are effectively integrated (using np.cumsum) prior to detrending (to remove non-cyclic trends that we don't want). So, ultimately the calculated HR used in subsequent metrics is still based on what should be the original / non-derivative PPG signal.

I'm not completely sure if Neg_Pearson can't operate on DiffNormalized frames as I and others have been able to get somewhat reasonable results using such inputs, but maybe you can explain more on that, or @McJackTang or @xliucs can chime in based on their own experiences. I will say that there is an update coming to the PhysNet implementation in the toolbox based on #197 and other findings while updating this toolbox's pre-print (a new version of which will likely also be released in the coming few months). That update may include better results using raw inputs and some additional tuning of hyperparameters and post-processing parameters, which may interest you (@McJackTang can explain a bit more when he has time).

Dylan-H-Wang commented 10 months ago

Thank you for the reply.

I think my main confusion is that comparing DeepPhys and PhyNet:

  1. they has the same LABEL_TYPE: DiffNormalized, which means their ground truth label should have the same data format and shape.
  2. the output of DeepPhys is a single value (1st derivative of rPPG), but PhysNet has a sequential output, i.e., rPPG signals. Following this, in the trainer of PhysNet https://github.com/ubicomplab/rPPG-Toolbox/blob/53b84584c2501f40ac925e141e7b908d1013d002/neural_methods/trainer/PhysnetTrainer.py#L115-L117 I can understand we need normalise the predictions which is a rPPG signal, but why we need to normalise the ground truth label, whereas in the DeepPhys trainer, there is no such processing codes, although both PhysNet and DeepPhys have the same ground truth.
  3. This also raise another confusion that the outputs of DeepPhys (1 st derivative of rPPG) and PhysNet (rPPG signals) are different, why they can use the same ground truth labels to update the model. Should not the predictions of PhysNet be DiffNormalized by https://github.com/ubicomplab/rPPG-Toolbox/blob/53b84584c2501f40ac925e141e7b908d1013d002/dataset/data_loader/BaseLoader.py#L558-L564 in order to be compared with DiffNormalized labels?
yahskapar commented 9 months ago

@Dylan-H-Wang,

I think @McJackTang may be able to explain a bit more in the future when he has time regarding your second point.

My understanding is that additional normalization (on lines 115 to 117 as you pointed out) is necessary based on the original PhysNet implementation and some aspects of that implementation (e.g., as to why normalize the ground truth as well) are not explicitly documented and likely have to do with the loss calculation performed here. I believe the loss function expects normalized inputs. Those inputs are effectively difference frames still since the inputs to PhysNet, and outputs from PhysNet, in our implementation are still difference frames or estimated based on difference frames. The outputs from PhysNet, even if still difference frames, aren't automatically normalized though and need to be normalized as per the code you pointed out.

Are you worried that somehow the normalization in your second point is different than the normalization (not the difference frame calculation) in the diff_normalize_label() function?

McJackTang commented 9 months ago

@Dylan-H-Wang I agree with @yahskapar. For Physnet, the original paper suggests using raw frames, which is able to get the best results. However, the raw frames may lead to further problems in Toolbox system since Physnet is very sensitive as a 3D CNN network. You need to adjust the parameters and training schedule to fine-tune Physnet if using raw frames. But if you use Diffnormalize, it could be easier to train with existing code. The results are similar to the raw input with fine-tuned models according to my experience.

Dylan-H-Wang commented 9 months ago

Thank you for the explains @yahskapar @McJackTang !

I think I misunderstood the DiffNormalised label before, and after revisiting the DeepPhys codes, I can understand now why it also works for PhysNet, although I am still confused about why the additional normalisation for predicted rPPG and label BVP_label are needed here while DeepPhys does not.

yahskapar commented 9 months ago

Some models (e.g., PhysNet) might need normalization simply to constrain the output signal to some reasonable range of values for subsequent usage (e.g., loss calculations). That's what the PhysNet authors themselves do originally in their implementation here, and again, I think all it has to do with is that need for the output to fall within a certain range of values. DeepPhys, which in this toolbox utilizes a MSE loss, simply does not have that kind of requirement.

Dylan-H-Wang commented 9 months ago

Ok, that makes sense. Thank you for the helps!