Closed rakhimovv closed 8 months ago
It depends on the internal definition of a task-specific success criterion. I think when generating the data, RLBench will drop bad episode on its own. It would be good if you could provide some screenshot regarding the issue you are seeing.
@zhouxian
For example, in this case, the success rate is equal to 0.0. However, after checking the sequence of saved images, it appears the actions are right visually. I'm just curious about what this number 0.0 means in this context.
The number produced by env.verify_demos.
Visually correct is not necessarily considered as success. @nickgkan Could you take a look at this case here?
verify_demos method checks rewards of keypoint frames.
However, upon visual inspection of the images extracted from the .npy files, none of the frames are observed to have a reward of 1. Here are two example keyframes sequences demonstrating the issue:
I suspect the way keyframes are selected should be edited or validation of .npy data should not be based on keyframe rewards.
The problems remains even if I run eval.sh
script with ground truth keyposes and trajectory. E.g. on task close_jar
Hi, there are two issues in your case.
One is on our side of storing demos that may have failed, we fixed that here.
Another one is on RLBench's side. We have discovered that the success conditions are buggy for some tasks such as close_jar, sort_shape and lamp_on. You need to modify the library internally to fix this.
Hi, thank you for the published code!
I am not well familiar with RLBench and wonder: when the success rate for the variation is less than 1.0, how should we interpret this value?
Does it mean that the episode data is bad, and we need to drop such episodes?