Closed StarCycle closed 1 month ago
Hello @StarCycle,
Thank you for your interest in our work.
Differing metrics in PushT task: To clarify, there are two main metrics for PushT: (a) success rate, which is counted when the policy achieves a certain coverage at any point during the policy rollout and (b) final coverage, which is measured as the coverage at the end of the policy rollout. We report the final coverage metric as it is more indicative of final performance and less susceptible to arbitrary increases in coverage that a randomized policy can achieve over a long enough horizon.
The values of Diffusion Policy was obtained by running the checkpoints which are downloaded from the official repo. (https://diffusion-policy.cs.columbia.edu/data/experiments/)
Hello @jayLEE0301 @notmahi ,
I test diffusion policy using final IoU (following your settings). I load the official checkpoint, evaluate the Diffusion Policy with Unet for 20 episodes and get an average final IoU of 81.8%, here is my result:
The full colab notebook is diffusion_policy_vision_pusht_demo.ipynb.txt, please remove .txt
in the file extension. Notice that I do several modifications to follow your settings:
!pip install "jax[cuda12_local]==0.4.23" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
reward = np.clip(coverage / self.success_threshold, 0, 1)
to reward = np.clip(coverage, 0, 1)
, so it directly return IoU.evaluate_single_episode()
function that returns rewards[-1], i.e., the final IoU in this case.Perhaps something gets wrong on my side...would you like to look into it?
Hello @StarCycle
We checked the colab notebook you sent, but there are some differences between your colab notebook code and our evaluation setting, which is based on the DiffusionPolicy original github repo (https://github.com/real-stanford/diffusion_policy):
reward = np.clip(coverage / self.success_threshold, 0, 1)
, it returns you a value coverage divided by success_threshold, not coverage. This will return a value greater than coverage.ckpt_path = "pusht_vision_100ep.ckpt"
.Thus, we would like to show how you can run the evaluation using the original github repo and checkpoint downloaded from here (https://diffusion-policy.cs.columbia.edu/data/experiments/image/pusht/diffusion_policy_cnn/)
eval.py
, pusht_image_runner.py
, and pusht_env.py
. We created a github gist so that you can change each part easily. Check out this revision that shows which parts of each file you need to modify. (These modifications enable getting coverage arrays from multi-processed envs)The average coverage values we obtained from the DiffusionPolicy-CNN's three checkpoints (which are in our manuscript) were as follows
0.6835, 0.6101, 0.6957
Today, we ran this experiment again on a new machine and got the following values (we found that there's a significant variance between conda env / machine / seeds).
0.6306, 0.6656, 0.7990
We hope this answer is helpful to you. Thank you.
Hello @jayLEE0301 ,
Thank you for your response!
reward = np.clip(coverage, 0, 1)
instead of reward = np.clip(coverage / self.success_threshold, 0, 1)
in my test.Hi @StarCycle!
RuntimeError: Error(s) in loading state_dict for ModuleDict:
Missing key(s) in state_dict: "vision_encoder.conv1.weight", "vision_encoder.bn1.weight", "vision_encoder.bn1.bias", "vision_encoder.layer1.0.conv1.weight", "vision_encoder.layer1.0.bn1.weight", "vision_encoder.layer1.0.bn1.bias", "vision_encoder.layer1.0.conv2.weight", "vision_encoder.layer1.0.bn2.weight", "vision_encoder.layer1.0.bn2.bias", "vision_encoder.layer1.1.conv1.weight", "vision_encoder.layer1.1.bn1.weight", "vision_encoder.layer1.1.bn1.bias", "vision_encoder.layer1.1.conv2.weight", "vision_encoder.layer1.1.bn2.weight", "vision_encoder.layer1.1.bn2.bias", "vision_encoder.layer2.0.conv1.weight", "vision_encoder.layer2.0.bn1.weight", "vision_encoder.layer2.0.bn1.bias", "vision_encoder.layer2.0.conv2.weight", "vision_encoder.layer2.0.bn2.weight", "vision_encoder.layer2.0.bn2.bias", "vision_encoder.layer2.0.downsample.0.weight", "vision_encoder.layer2.0.downsample.1.weight", "vision_encoder.layer2.0.downsample.1.bias", "vision_encoder.layer2.1.conv1.weight", "vision_encoder.layer2.1.bn1.weight", "vision_encoder.layer2.1.bn1.bias", "vision_encoder.layer2.1.conv2.weight", "vision_encoder.layer2.1.bn2.weight", "vision_encoder.layer2.1.bn2.bias", "vision_encoder.layer3.0.conv1.weight", "vision_encoder.layer3.0.bn1.weight", "vision_encoder.layer3.0.bn1.bias", "vision_encoder.layer3.0.conv2.weight", "vision_encoder.layer3.0.bn2.weight", "vision_encoder.layer3.0.bn2.bias", "vision_encoder.layer3.0.downsample.0.weight", "vision_encoder.layer3.0.downsample.1.weight", "visio...
Unexpected key(s) in state_dict: "_dummy_variable", "obs_encoder.obs_nets.image.backbone.nets.0.weight", "obs_encoder.obs_nets.image.backbone.nets.1.weight", "obs_encoder.obs_nets.image.backbone.nets.1.bias", "obs_encoder.obs_nets.image.backbone.nets.4.0.conv1.weight", "obs_encoder.obs_nets.image.backbone.nets.4.0.bn1.weight", "obs_encoder.obs_nets.image.backbone.nets.4.0.bn1.bias", "obs_encoder.obs_nets.image.backbone.nets.4.0.conv2.weight", "obs_encoder.obs_nets.image.backbone.nets.4.0.bn2.weight", "obs_encoder.obs_nets.image.backbone.nets.4.0.bn2.bias", "obs_encoder.obs_nets.image.backbone.nets.4.1.conv1.weight", "obs_encoder.obs_nets.image.backbone.nets.4.1.bn1.weight", "obs_encoder.obs_nets.image.backbone.nets.4.1.bn1.bias", "obs_encoder.obs_nets.image.backbone.nets.4.1.conv2.weight", "obs_encoder.obs_nets.image.backbone.nets.4.1.bn2.weight", "obs_encoder.obs_nets.image.backbone.nets.4.1.bn2.bias", "obs_encoder.obs_nets.image.backbone.nets.5.0.conv1.weight", "obs_encoder.obs_nets.image.backbone.nets.5.0.bn1.weight", "obs_encoder.obs_nets.image.backbone.nets.5.0.bn1.bias", "obs_encoder.obs_nets.image.backbone.nets.5.0.conv2.weight", "obs_encoder.obs_nets.image.backbone.nets.5.0.bn2.weight", "obs_encoder.obs_nets.image.backbone.nets.5.0.bn2.bias", "obs_encoder.obs_nets.image.backbone.nets.5.0.downsample.0.weight", "obs_encoder.obs_nets.image.backbone.nets.5.0.downsample.1.weight", "obs_encoder.obs_nets.image.backbone.nets.5.0.downsample.1.bias", "obs_encoder.obs_nets.i...RuntimeError: Error(s) in loading state_dict for ModuleDict:
Hi @jayLEE0301,
Thank you for sharing this work! I have a small question about the result...
Table 2 computes the intersection of area IoU in the pusht task (this metric is not equal to success rate, since the pusht env think IoU > 0.95 is a success). The performance of diffusion policy is quite low here, only 0.66 with the DiffPolicy-C (UNet)
But if you go to the Diffusion policy paper, DiffPolicy-C can easily achieve IoU of 80% in real world
Do you use the official implementation? Any clue about it?