jayLEE0301 / vq_bet_official

Official code for "Behavior Generation with Latent Actions" (ICML 2024 Spotlight)
https://sjlee.cc/vq-bet/
MIT License
113 stars 12 forks source link

Diffusion Policy in vqbet paper has lower performance than the diffusion policy paper #4

Closed StarCycle closed 1 month ago

StarCycle commented 5 months ago

Hi @jayLEE0301,

Thank you for sharing this work! I have a small question about the result...

Table 2 computes the intersection of area IoU in the pusht task (this metric is not equal to success rate, since the pusht env think IoU > 0.95 is a success). The performance of diffusion policy is quite low here, only 0.66 with the DiffPolicy-C (UNet)

图片

But if you go to the Diffusion policy paper, DiffPolicy-C can easily achieve IoU of 80% in real world

图片

Do you use the official implementation? Any clue about it?

jayLEE0301 commented 5 months ago

Hello @StarCycle,

Thank you for your interest in our work.

Differing metrics in PushT task: To clarify, there are two main metrics for PushT: (a) success rate, which is counted when the policy achieves a certain coverage at any point during the policy rollout and (b) final coverage, which is measured as the coverage at the end of the policy rollout. We report the final coverage metric as it is more indicative of final performance and less susceptible to arbitrary increases in coverage that a randomized policy can achieve over a long enough horizon.

The values of Diffusion Policy was obtained by running the checkpoints which are downloaded from the official repo. (https://diffusion-policy.cs.columbia.edu/data/experiments/)

StarCycle commented 5 months ago

Hello @jayLEE0301 @notmahi ,

I test diffusion policy using final IoU (following your settings). I load the official checkpoint, evaluate the Diffusion Policy with Unet for 20 episodes and get an average final IoU of 81.8%, here is my result:

图片

The full colab notebook is diffusion_policy_vision_pusht_demo.ipynb.txt, please remove .txt in the file extension. Notice that I do several modifications to follow your settings:

Perhaps something gets wrong on my side...would you like to look into it?

jayLEE0301 commented 5 months ago

Hello @StarCycle

We checked the colab notebook you sent, but there are some differences between your colab notebook code and our evaluation setting, which is based on the DiffusionPolicy original github repo (https://github.com/real-stanford/diffusion_policy):

Thus, we would like to show how you can run the evaluation using the original github repo and checkpoint downloaded from here (https://diffusion-policy.cs.columbia.edu/data/experiments/image/pusht/diffusion_policy_cnn/)

  1. git clone (https://github.com/real-stanford/diffusion_policy) and install all the dependencies
  2. You will need to make changes to eval.py, pusht_image_runner.py, and pusht_env.py. We created a github gist so that you can change each part easily. Check out this revision that shows which parts of each file you need to modify. (These modifications enable getting coverage arrays from multi-processed envs)

The average coverage values we obtained from the DiffusionPolicy-CNN's three checkpoints (which are in our manuscript) were as follows 0.6835, 0.6101, 0.6957 Today, we ran this experiment again on a new machine and got the following values (we found that there's a significant variance between conda env / machine / seeds). 0.6306, 0.6656, 0.7990

We hope this answer is helpful to you. Thank you.

StarCycle commented 5 months ago

Hello @jayLEE0301 ,

Thank you for your response!

图片

notmahi commented 5 months ago

Hi @StarCycle!