Closed wangyan-hlab closed 1 year ago
Hey, happy to see you're making so much progress!
I used their codebase as a starting point, and made my own (hacky) scripts (scalingup_real_scripts.zip).
scaling_up_eval_real_robot.py
. Most of the structure is identical to that of diffusion policy's, and it uses their multiprocess shared memory ring buffer stack. If I'm not mistaken, you have a FR5 setup, not a UR5. I would start by making a separate class to support your hardware platform (e.g., change out RTDEInterpolationController
, etc.)scaling_up_constants.py
, and the entire policy/network definition inside scaling_up_policy.py
. Debugging: Preferably, you'd get an idea of what kind of actions your policy will do before it actually runs on the real robot. You can collect some real world policy inputs using scaling_up_generate_test_cases.py
, evaluate these policies offline using scaling_up_offline_evaluation.py
, and visualize the policy's predicted actions as pointclouds using scaling_up_visualize_actions.py
. Lastly, you can also just directly replay an action from simulation (open loop, does not use any learned policy) in the real environment using scaling_up_replay_real_robot.py
, just to test that your sim actions are indeed reasonable in your real world setup. This script will be a good starting point to make sure you've supported your FR5 setup correctly in the diffusion policy codebase.
Hopefully these will still give you a good starting point!
Hi, @huy-ha
Thank you very much for your help! I have successfully evaluated the policy on our real FR5 robot. Although the evaluation keeps failing to finish a bin transport task, I believe it suffers from a suboptimal policy and the sim2real gap.
BTW, I find that my policy tends to start a transport action without actually grasping a object (e.g., if one finger contacts another without grasping a object, the robot still continue to move to the target bin). I wonder if it is related to the collision detection setup?
Best regards
Hi @wangyan-hlab ,
Happy to see you did the successful real experiments!
I am still struggling with that. In real experiments, the output (action sequences) is always pointing to a weird direction. Can you please advise me on the following questions:
Appreciate your kind reply and help!
Best regards
Hey @wangyan-hlab,
Great to see you've done a first round of evaluations.
In your training data, did the policy observe retrying behavior? You can visualize all videos of a data generation process on weights and biases. In my experiments, there were plenty of retries, but I just want to eliminate this as a possible cause.
Hey @yellow07200 ,
In my experiments, since I used domain randomization over camera poses, I didn't have to calibrate. I just placed the camera in front of the robot where it roughly matched.
the output (action sequences) is always pointing to a weird direction
Did you load the action normalization in from the checkpoint? Is it completely off or is it close the the object but not on the object?
Hi @huy-ha,
I didn't set the env/domain_rand_config
... I think I need to regenerate the dataset and train the model again.
https://github.com/real-stanford/scalingup/blob/3d2f43c213aed8b2c811e635ac8f3ef39bd210c4/scalingup/config/evaluation/single_env.yaml#L5
Thanks for your help!
Best regards
Hey @wangyan-hlab,
Great to see you've done a first round of evaluations.
- Did you use domain randomization? This should help with visual sim2real a lot.
- You can consider increasing the magnitude of visual augmentations. This should help the vision encoder learn more transferrable representations.
- Did the policy do well in simulation evaluation?
- In your training data, did the policy observe retrying behavior? You can visualize all videos of a data generation process on weights and biases. In my experiments, there were plenty of retries, but I just want to eliminate this as a possible cause.
Hey @yellow07200 ,
In my experiments, since I used domain randomization over camera poses, I didn't have to calibrate. I just placed the camera in front of the robot where it roughly matched.
the output (action sequences) is always pointing to a weird direction
Did you load the action normalization in from the checkpoint? Is it completely off or is it close the the object but not on the object?
Hi, @huy-ha
Thank you for your reply.
I think I will first try to increase the magnitude of visual augmentations and improve the performance of sim evaluation. If the success rate of the sim evaluation rises, I will try the real evaluation again to see what will happen.
Best regards
Oh interesting. And this is with your FR5 setup right? Did the policy's accuracy on weights and biases converge yet, and how many datapoints did you use? Did you also try reproducing the transport results with the codebase's UR5, and did that work?
Oh interesting. And this is with your FR5 setup right? Did the policy's accuracy on weights and biases converge yet, and how many datapoints did you use? Did you also try reproducing the transport results with the codebase's UR5, and did that work?
Yes, it is with my FR5 setup. I trained 5,000 steps x 10 epochs and the training loss seems to converge to about 0.003. The terminal output "Using 68,182 points from 662 trajectories out of 662 (100.0%)", so 68182 datapoints were supposed to be used. I haven't reproduced the results on a UR5 yet, but I do have access to a UR5e and maybe try it if possible.
Hi @huy-ha,
Actually, the phenomenon I described in my last post happened in simulation (sorry for the unclear description), so the evaluation result was poor with an average success rate of 20~30%. The average success rate in data generation was over 70%.
I am facing the same issue when I change my setup to UR10.
Additionally, after I used domain randomization over camera poses to generate the dataset using the same setup in the original code (UR5), The training success rate is also only around 20%. Do you have any idea why this happens? Many thanks.
Best regards
@yellow07200 Could you share some code to reproduce the UR10 setup? Also, in my case, domain randomization with the original UR5 setup achieves >80%, so this is unexpected as well. Did you install the conda environment exactly as in the provided yaml file?
@wangyan-hlab That loss seems normal to me, but the behavior is very surprising. Can I reproduce this result with the latest commit from #18?
@yellow07200 Could you share some code to reproduce the UR10 setup? Also, in my case, domain randomization with the original UR5 setup achieves >80%, so this is unexpected as well. Did you install the conda environment exactly as in the provided yaml file?
@wangyan-hlab That loss seems normal to me, but the behavior is very surprising. Can I reproduce this result with the latest commit from #18?
@huy-ha Hi, huy. Yes, I think the result would be reproduced with the latest commit from #18. Please let me know if there's any problem. Thank you very much.
Hey @wangyan-hlab ,
Thanks for being patience. Compute was tight due to CVPR deadline.
The steps I took include:
examples/exploration_task_tree.py
and replaced the environment config name with the FR5 bin transport config. Running it showed me that the entire kinematic hierarchy of the robot appeared in the language prompt, which is not useful information for the LLM. In my code before, I hardcoded "UR5" into the LanguageStateEncoder
to filter out the robot's kinematic chain. I've made the code more generic to robot name, so now the FR5's kinematic chain doesn't appear in the prompt (https://github.com/real-stanford/scalingup/commit/218a618b8922da5018e824721011dd6727f2f8f9). common.yaml
).I trained 5,000 steps x 10 epochs and the training loss seems to converge to about 0.003.
Using the default configuration (10000 steps x 10 epochs), the diffusion loss at epoch 6 is 0.0038. MSE Loss /mean and /best were 0.0348 and 0.00786 respectively. Below I've attached some visualizations.
https://github.com/real-stanford/scalingup/assets/33562579/4d470a22-8c2c-41e2-85e6-ab3d2ad514c3 https://github.com/real-stanford/scalingup/assets/33562579/14166aca-e08b-4ef3-aa00-ea02b34abf0b https://github.com/real-stanford/scalingup/assets/33562579/704dd158-fce4-48f2-9b97-7c75498638f4 https://github.com/real-stanford/scalingup/assets/33562579/ed9d8641-6290-4f92-82dd-7bd7d5dcfd38
Qualitatively, the policy does exhibit retrying behavior. Quantitatively, its plateauing at 70% success rate, but I think its mostly due to the policy running out of time.
In summary, our data should have been identical, but I just ran data generation for longer to get more data. Our policy training configuration is identical except for how long I trained for. However, the results above were from epoch 6, so that policy didn't train much longer than yours did.
I'm still surprised about your result. I don't think the difference in the amount of data should have resulted in such a big difference. I would have guessed that with 600 trajectories yours would have achieved about 60% or so. I'll run another training with roughly a similar amount of data and let you know if it also performs poorly.
Hi Huy @huy-ha,
Thank you very much for reproducing the results on FR5 robot!
Please allow me to briefly summarize the differences between your reproducing setup and mine:
I'm really happy to see your excellent reproducing results but also surprised about the differences.
Due to the limit of my device and time, I wasn't able to generate too much data for training. But as you say, my success rate was much lower than expected with 600 trajectories.
I really appreciate your help and look forward to your reply.
Good luck to you and your team at the CVPR!
Best regards
Yep! However, I don't think 1) contributed any difference because the data both you and I had succeeded around 70% of the time and had retry attempts. 2) and 3) are the significant differences. I'll let you know when I get the results.
Hey @wangyan-hlab ,
Just a quick update.
Not surprisingly, top down camera views are better this grasping task than wrist mounted ones, and more data does better.
You used 662 trajs/68182pts, but only got 20 - 30%. I think it can still reach close to 44% if you just leave it training for longer.
Hope these experiment results help!
Hi Huy, @huy-ha
First, thank you for your kind help so far. I have been pushing the reproduction work forward and now I think I am ready to evaluate the policy on a real robot.
According to your guidance, I found the diffusion policy repo. There are 2 questions about it:
diffusion_unet_hybrid_image_policy
?)_ And do I need to edit the policy to fit your scalingup policy, or modify some other codes?but the checkpoint from my training seems to have a different structure and there isn't a 'cfg' key there, as well as other keys used in the script. Would you please give more information about how to modify the script to fix the issue?
Also, is it possible to load a model from the checkpoint and directly predict the action (i.e. eef position+eef uppermat+gripper command) with the REAL observation? I am trying to extract the code instantiating a diffusion policy model, provide a fake input, and hope to get some output:
But I haven't find a proper way to output something. Would you please give some suggestions?
Best Regards
Originally posted by @wangyan-hlab in https://github.com/columbia-ai-robotics/scalingup/issues/1#issuecomment-1706263853