jity16 / ACE-Off-Policy-Actor-Critic-with-Causality-Aware-Entropy-Regularization

Official PyTorch implementation of "ACE:Off-Policy Actor-Critic with Causality-Aware Entropy Regularization"
20 stars 0 forks source link

Question about the experiments #3

Closed HYeCao closed 3 months ago

HYeCao commented 3 months ago

When reproducing experimental environments such as walker2d, pick place wall, etc., we have observed significant discrepancies between our results and those reported in the papers. What could be causing this issue?

cheryyunl commented 3 months ago

Hi here is some of our experiment data on pick-place-wall. In some seeds, the agent may fail. (DrmCAC=ACE and DrmCBAC=ACE+BEE)

image

Please check your MetaWorld and Gym version, also the reward and state type. We will check the metaworld wrapper again.

jity16 commented 3 months ago

Here is the a snapshot of our WandB dashboard for the walker2d -v2 environment, without any smoothing techniques. Some suggestions on HPs that may enhance the performance in walker2d-v2, try "batch_size: 256, updates_per_step: 1, target_update_interval: 1, hidden_size: 256"

image
HYeCao commented 3 months ago

Thank you very much for your reply. I will try it again by your HPs. Can you provide a complete settings of HPs in different environments and tasks?

cheryyunl commented 3 months ago

update experiment curves