rail-berkeley / serl

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning
https://serl-robot.github.io/
MIT License
381 stars 44 forks source link

General questions about the code and implementation #51

Closed nisutte closed 4 days ago

nisutte commented 6 months ago

Thanks for open-sourcing your software.

I've worked with the repo for a while now and implemented a controller and environment for the robotiq UR5 arm. Let me know if the contribution to the main repo would be welcome.

During the implementation of my own DRQ agent I've collected some questions about the code, which were not mentioned in the SERL paper:

  1. For the DRQ examples, the data sampled is 50/50, half from demo and half from online experience. Is there a reason for not using a shared replay buffer?
  2. When using the pretrained ResNet model, the pooling method spatial_learned_embeddings is used instead of the standard average. I've seen it in Googles Rainbow DQN, but it was not used there. Did you notice performance increases with this pooling method or what was the reason behind the decision?
  3. In the soft actor critic implementation, the step labeled backup_entropy was implemented without the discount factor (compared to the original jaxrl_m implementation). Since it was not used in SERL (question 4) it does not matter, but I'm still curious about it.
  4. Following the last question, the parameter backup_entropy was set to False for all the experiments. What led to this decision?

(supervisor: @vhartman)

Thank you in advance for your time and assistance. I look forward to your response.

jianlanluo commented 5 months ago

Hi,

Thanks for the interest in SERL! Contributions are definitely welcomed, and support for U55 would be quite useful; you could open a PR, then we'll work with you to get it checked in.

I wonder what types of tasks have you tried on UR5? Could you send me and other authors videos if possible?

To answer your questions,

  1. Initially sampling more from demos would help the policy learn faster; later on when it becomes more off-policy, the demo will become less helpful, however as long as the learner can train fast enough, the buffer splitting would not be an issue. You can also use one shared buffer, that shouldn't be that different.
  2. we use learned spatial embedding since most of these task considered require good pose information. 3.4., backup_entropy is not used because we address the exploration problem with demonstrations; the discount factor thing is probably a typo
nisutte commented 5 months ago

Thank you for your quick answer.

Perfect, I'll send the PR once I'm happy with my controller implementation (so far it's a bit messy). Also I'll remember to record some videos and send them to you when I have a working policy.

My Task consists of picking up cardboard boxes with the EPick Vacuum Gripper. I'm using one RealSense D405 Camera as a wrist cam & use DRQ with the pretrained ResNet10 provided. I've had some success with the method, but the model struggles with boxes it has never seen (different colors / shapes). You can see the code in my fork of the repo under the develop branch.

Thats why I'm trying to move towards using depth / point cloud data and PointNet++ (or similar) for the Vision encoder, hoping that the model can generalize better with different types of boxes.

jianlanluo commented 5 months ago

It would be very useful to see it working with a UR5, so definitely send us videos!

In general, if the object is completely not seen, there is no guarantee it would succeed consistently, I'd suggest to add some randomization during training.

We haven't tried using SERL with depth, it would be interesting to see how it works there

nisutte commented 4 months ago

I've since worked on the project and moved towards point-cloud based encoders as image data. Some videos about the setup can be found here: Google Drive. The video's of the robot picking the boxes are from an older policy using RGB images from a single camera.

A brief description of the environment:

Some conclusions:

As for now, I'm experimenting on using 2 D405 Cameras to construct a voxel grid around the gripper. This can be seen in the video found here. I'm hoping that the RL policy can generalize better on Point-Cloud based representations, since RGB images can be deceiving.

I'll keep you updated on the progress. If you have more questions, feel free to ask.

jianlanluo commented 4 months ago

Congrats on the nice progress! In the videos, did the policy use cameras inputs?

On Tue, Jul 16, 2024 at 10:12 AM SchNico24 @.***> wrote:

I've since worked on the project and moved towards point-cloud based encoders as image data. Some videos about the setup can be found here: Google Drive https://drive.google.com/drive/folders/1sJRo6gE6n1vXutMxakOa1xPUMN5_24LU?usp=sharing

A brief description of the environment:

  • The Observation space consists of TCP-pose, velocity, force, torque and gripper position (pressure of the vacuum gripper)
  • The action space is the relative motion of the pose, same as in your case
  • The goal is to grip the box and lift it to the starting position

Some conclusions:

As for now, I'm experimenting on using 2 D405 Cameras to construct a voxel grid around the gripper. This can be seen in the video found here https://drive.google.com/drive/folders/1sJRo6gE6n1vXutMxakOa1xPUMN5_24LU. I'm hoping that the RL policy can generalize better on Point-Cloud based representations, since RGB images can be deceiving.

I'll keep you updated on the progress. If you have more questions, feel free to ask.

— Reply to this email directly, view it on GitHub https://github.com/rail-berkeley/serl/issues/51#issuecomment-2231425668, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZPFYVY67RMQ2GUVJ5I53LZMVH7BAVCNFSM6AAAAABINGUNBSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZRGQZDKNRWHA . You are receiving this because you commented.Message ID: @.***>

nisutte commented 4 months ago

Yes, the policy on the video used camera inputs with a pretrained ResNet18 encoder

jianlanluo commented 3 months ago

That would be super nice, if you would like, could you open a PR to check in the code?