nickgkan / 3d_diffuser_actor

Code for the paper "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations"
https://3d-diffuser-actor.github.io/
MIT License
159 stars 16 forks source link

Sigmoid usage #4

Closed rakhimovv closed 3 months ago

rakhimovv commented 4 months ago

Hi, thank you for your excellent work!

In the Act3D implementation, the gripper openness is predicted in the range [0, 1], as it applies a sigmoid function within its forward method. However, I believe there might be an issue with the loss function used in this repository. The prediction is not a logit, yet the loss calculation uses F.binary_cross_entropy_with_logits, which seems incorrect:

losses["gripper"] = F.binary_cross_entropy_with_logits(pred["gripper"], gt_action[:, 7:8])

In a previous repository version, the loss was calculated as follows, which appears to be more appropriate given the prediction's nature: losses["gripper"] = F.mse_loss(pred["gripper"], gt_action[:, 7:8])

Additionally, regarding the diffuser actor, there seems to be an error in the metric calculation: openess = ((pred[..., 7:].sigmoid() >= 0.5) == (gt[..., 7:] > 0.0)).bool() The application of sigmoid here seems redundant. For example, in the evaluation scripts, the approach is simply to round the values: trajectory[:, -1] = trajectory[:, -1].round()

Could you please clarify? Thank you!

nickgkan commented 4 months ago

Hi, Thanks for your interest in our work and looking into our code in such detail!

First, regarding Act3D, you’re correct, we should not use the binary_cross_entropy_with_logits loss since sigmoid has been applied earlier. We shifted to binary_cross_entropy and we’re retraining Act3D. We'll update the code and results if needed.

Second, regarding 3D Diffuser Actor metrics, sigmoid is not applied inside the model code and the openess head is trained with binary_cross_entropy_with_logits, so the use of sigmoid is correct. What is actually not correct is to simply round the openess score, since that would result in an integer of arbitrary magnitude.

Why the above works for RLBench however? On RLBench, the evaluation code assumes that the openess score is in the [0, 1] range and thresholds by 0.5. In our case, our pre-sigmoid logits should be either large positive (so that sigmoid gives close to 1) or negative (so that sigmoid gives close to 0). Ideally the rounded openess should be compared to 0. Now 0.5 is just giving some more margin to the negative side, which we don't think it makes an actual difference at test time. We'll investigate more and share our updates.

twke18 commented 4 months ago

Hi,

Here are our updates. We replace the binary_cross_entropy_with_logits loss function with binary_cross_entropy and re-trained Act3D on GNFactor. As expected, we didn't observe significant performance change among two loss functions.

For verification, replaying scenes from the demonstrations, we calculated l2 errors / l1 error / accuracy among predicted positions / quaternions / gripper openess and those from the demonstrations. As shown in the following figure, the black curves denote the old model and the yellow curves denote the new model. Both models achieve similar performance.

圖片

We have also tested both models in RLBench simulation. Likewise, both models have similar performance. We conclude that it makes no difference training act3d with binary_cross_entropy_with_logits or binary_cross_entropy on RLBench. Hope we have addressed your concern.

rakhimovv commented 3 months ago

@nickgkan @twke18 Thank you for your thorough response and the efforts to address my concerns! Your commitment to improving and validating the Act3D implementation and the clarification on the 3D Diffuser Actor metrics are highly appreciated.