tianhaowuhz / human-assisting-dex-grasp

MIT License
28 stars 1 forks source link

Very solid work! Do you have plans for releasing codes for refining and visualization? #4

Open sjtuyinjie opened 2 weeks ago

sjtuyinjie commented 2 weeks ago

Very solid work! Do you have plans for releasing codes for refining and visualization? By the way, I noticed an amazing performance on your real-world experiments. How do you solve the sim2real gap?

tianhaowuhz commented 1 week ago

Thank you. I am currently really busy with other project and hardly have time for refining, sorry about that. For visualization, what kind of visualization do you want, such as visualization of env in our gif or training process. For sim2real, there are visual and dynamic gap, for visual, since we only use pointcloud and robot state in sim, and we use fused point cloud in real, so the gap is small; for dynamic gap, we mainly use position-based control and since our task is human-assisting, the arm is controlled by human, we do not need to consider the gap of arm.

sjtuyinjie commented 1 week ago

Very nice of you to reply so quickly!

  1. Actually I mostly want the visualization code in the training process. I tried to run with --headless False, but it reports bugs. So I wonder how you visualize in the training.
  2. You mentioned that the point cloud gap is small, but in my practice, the point cloud generated by depth cameras such as D435I has a limited-accuracy and is relatively sparse. But the point cloud in Isaac Gym is quite dense and Ground-Truth-level accuracy. So I'm curious how you deal with this gap, or just use some basic methods such as domain randomization?

I'm not in a hurry, so please first deal with your own project. If you have time after that, you could spend some time replying to me. Respect!

tianhaowuhz commented 1 week ago
  1. We actually vis by downloading ckpt, but I recommend you can use this for visualization during training https://github.com/NVlabs/sim-web-visualizer
  2. We use four realsense cameras in the real world env, and fuse four views pointcloud, then we would clip pointcloud within a area to remove obstacles.
sjtuyinjie commented 1 week ago

Thanks again for your answer. You mention that you fuse point cloud from four cameras and clip them manually. Do you mean you tried to reduce the point cloud gap between sim and reality and just zero-shot deploy the trained policy to reality?

tianhaowuhz commented 1 week ago

Yes, we do not finetune on real.