Open HYeCao opened 1 week ago
These rewards look different from our results. I check my wandb, when rewards > 3000, it can easily learn the skills. But when it can't learn the skill, the reward remains 0:
Could you give me your gym version or gym robotics version to check if the adroit wrapper is working?
And in our experiments we also found that in adroit door, the vibration of the success rate will be more violent than in other environments. It is a normal phenomenon.
And in our experiments we also found that in adroit door, the vibration of the success rate will be more violent than in other environments. It is a normal phenomenon.
Thanks for your reply, yongyuan. The gym version and gym robotics version are listed below. gym 0.19.0 gymnasium 0.29.1 gymnasium-robotics 1.2.4
Can you show me your HPs settings in Adroid hand environment? I use the default settings. "policy": "Gaussian", "gamma": 0.99, "tau": 0.005, "lr": 0.0003, "alpha": 0.2, "quantile": 0.9, "automatic_entropy_tuning": True, "batch_size": 512, "updates_per_step": 1, "target_update_interval": 2, "hidden_size": 1024, "msg": "default"
Hello, Tianying and Yongyuan. When running ACE with default parameters in the Adroit Hand environment, I've noticed that even after several times of policy learning, it fails to achieve results comparable to those reported in the paper. Should I adjust the parameters, or could there be another underlying issue?