Closed nisutte closed 4 days ago
Hi,
Thanks for the interest in SERL! Contributions are definitely welcomed, and support for U55 would be quite useful; you could open a PR, then we'll work with you to get it checked in.
I wonder what types of tasks have you tried on UR5? Could you send me and other authors videos if possible?
To answer your questions,
Thank you for your quick answer.
Perfect, I'll send the PR once I'm happy with my controller implementation (so far it's a bit messy). Also I'll remember to record some videos and send them to you when I have a working policy.
My Task consists of picking up cardboard boxes with the EPick Vacuum Gripper. I'm using one RealSense D405 Camera as a wrist cam & use DRQ with the pretrained ResNet10 provided. I've had some success with the method, but the model struggles with boxes it has never seen (different colors / shapes). You can see the code in my fork of the repo under the develop branch.
Thats why I'm trying to move towards using depth / point cloud data and PointNet++ (or similar) for the Vision encoder, hoping that the model can generalize better with different types of boxes.
It would be very useful to see it working with a UR5, so definitely send us videos!
In general, if the object is completely not seen, there is no guarantee it would succeed consistently, I'd suggest to add some randomization during training.
We haven't tried using SERL with depth, it would be interesting to see how it works there
I've since worked on the project and moved towards point-cloud based encoders as image data. Some videos about the setup can be found here: Google Drive. The video's of the robot picking the boxes are from an older policy using RGB images from a single camera.
A brief description of the environment:
Some conclusions:
backup_entropy=True
, see this commit for more infos.As for now, I'm experimenting on using 2 D405 Cameras to construct a voxel grid around the gripper. This can be seen in the video found here. I'm hoping that the RL policy can generalize better on Point-Cloud based representations, since RGB images can be deceiving.
I'll keep you updated on the progress. If you have more questions, feel free to ask.
Congrats on the nice progress! In the videos, did the policy use cameras inputs?
On Tue, Jul 16, 2024 at 10:12 AM SchNico24 @.***> wrote:
I've since worked on the project and moved towards point-cloud based encoders as image data. Some videos about the setup can be found here: Google Drive https://drive.google.com/drive/folders/1sJRo6gE6n1vXutMxakOa1xPUMN5_24LU?usp=sharing
A brief description of the environment:
- The Observation space consists of TCP-pose, velocity, force, torque and gripper position (pressure of the vacuum gripper)
- The action space is the relative motion of the pose, same as in your case
- The goal is to grip the box and lift it to the starting position
Some conclusions:
- I wrote a Robotiq Impedance Controller that can be used for the UR5 robot, see: Controller https://github.com/SchNico24/serl/blob/develop/serl_robot_infra/robot_controllers/robotiq_controller.py
- I'm using the DrQAgent with similar hyperparameters. The most notable change is the use of backup_entropy=True, see this commit https://github.com/SchNico24/serl/commit/4441f1a98eb4c851690aecc1bb933fb5c9b37cee for more infos.
- For me, the use of a pre-trained ResNet18 (weights from pytorch) improved the performance of the vision (rgb) based encoder, while only making the training slightly slower.
As for now, I'm experimenting on using 2 D405 Cameras to construct a voxel grid around the gripper. This can be seen in the video found here https://drive.google.com/drive/folders/1sJRo6gE6n1vXutMxakOa1xPUMN5_24LU. I'm hoping that the RL policy can generalize better on Point-Cloud based representations, since RGB images can be deceiving.
I'll keep you updated on the progress. If you have more questions, feel free to ask.
— Reply to this email directly, view it on GitHub https://github.com/rail-berkeley/serl/issues/51#issuecomment-2231425668, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZPFYVY67RMQ2GUVJ5I53LZMVH7BAVCNFSM6AAAAABINGUNBSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZRGQZDKNRWHA . You are receiving this because you commented.Message ID: @.***>
Yes, the policy on the video used camera inputs with a pretrained ResNet18 encoder
That would be super nice, if you would like, could you open a PR to check in the code?
Thanks for open-sourcing your software.
I've worked with the repo for a while now and implemented a controller and environment for the robotiq UR5 arm. Let me know if the contribution to the main repo would be welcome.
During the implementation of my own DRQ agent I've collected some questions about the code, which were not mentioned in the SERL paper:
spatial_learned_embeddings
is used instead of the standardaverage
. I've seen it in Googles Rainbow DQN, but it was not used there. Did you notice performance increases with this pooling method or what was the reason behind the decision?backup_entropy
was implemented without the discount factor (compared to the originaljaxrl_m
implementation). Since it was not used in SERL (question 4) it does not matter, but I'm still curious about it.backup_entropy
was set toFalse
for all the experiments. What led to this decision?(supervisor: @vhartman)
Thank you in advance for your time and assistance. I look forward to your response.