jity16 / ACE-Off-Policy-Actor-Critic-with-Causality-Aware-Entropy-Regularization

Official PyTorch implementation of "ACE:Off-Policy Actor-Critic with Causality-Aware Entropy Regularization"
13 stars 0 forks source link

Question about the adroit hand tasks. #4

Open HYeCao opened 1 week ago

HYeCao commented 1 week ago

Hello, Tianying and Yongyuan. When running ACE with default parameters in the Adroit Hand environment, I've noticed that even after several times of policy learning, it fails to achieve results comparable to those reported in the paper. Should I adjust the parameters, or could there be another underlying issue?

image
cheryyunl commented 1 week ago

These rewards look different from our results. I check my wandb, when rewards > 3000, it can easily learn the skills. But when it can't learn the skill, the reward remains 0:

image

Could you give me your gym version or gym robotics version to check if the adroit wrapper is working?

cheryyunl commented 1 week ago

And in our experiments we also found that in adroit door, the vibration of the success rate will be more violent than in other environments. It is a normal phenomenon.

HYeCao commented 3 days ago

And in our experiments we also found that in adroit door, the vibration of the success rate will be more violent than in other environments. It is a normal phenomenon.

Thanks for your reply, yongyuan. The gym version and gym robotics version are listed below. gym 0.19.0 gymnasium 0.29.1 gymnasium-robotics 1.2.4

Can you show me your HPs settings in Adroid hand environment? I use the default settings. "policy": "Gaussian", "gamma": 0.99, "tau": 0.005, "lr": 0.0003, "alpha": 0.2, "quantile": 0.9, "automatic_entropy_tuning": True, "batch_size": 512, "updates_per_step": 1, "target_update_interval": 2, "hidden_size": 1024, "msg": "default"