zhouxian / act3d-chained-diffuser

A unified architecture for multimodal multi-task robotic policy learning.
108 stars 9 forks source link

Success rate < 1.0 reported in validate_data_generation.py #8

Closed rakhimovv closed 8 months ago

rakhimovv commented 9 months ago

Hi, thank you for the published code!

I am not well familiar with RLBench and wonder: when the success rate for the variation is less than 1.0, how should we interpret this value?

Does it mean that the episode data is bad, and we need to drop such episodes?

zhouxian commented 9 months ago

It depends on the internal definition of a task-specific success criterion. I think when generating the data, RLBench will drop bad episode on its own. It would be good if you could provide some screenshot regarding the issue you are seeing.

rakhimovv commented 9 months ago

@zhouxian

image

For example, in this case, the success rate is equal to 0.0. However, after checking the sequence of saved images, it appears the actions are right visually. I'm just curious about what this number 0.0 means in this context.

The number produced by env.verify_demos.

zhouxian commented 9 months ago

Visually correct is not necessarily considered as success. @nickgkan Could you take a look at this case here?

QazyBi commented 9 months ago

verify_demos method checks rewards of keypoint frames.

https://github.com/zhouxian/act3d-chained-diffuser/blob/12f0a94eb63ec4405c12af3639aa8e168112b769/utils/utils_with_rlbench.py#L811

However, upon visual inspection of the images extracted from the .npy files, none of the frames are observed to have a reward of 1. Here are two example keyframes sequences demonstrating the issue:

image

image

I suspect the way keyframes are selected should be edited or validation of .npy data should not be based on keyframe rewards.

rakhimovv commented 9 months ago

The problems remains even if I run eval.sh script with ground truth keyposes and trajectory. E.g. on task close_jar

nickgkan commented 8 months ago

Hi, there are two issues in your case.

One is on our side of storing demos that may have failed, we fixed that here.

Another one is on RLBench's side. We have discovered that the success conditions are buggy for some tasks such as close_jar, sort_shape and lamp_on. You need to modify the library internally to fix this.