[Question] Solving Pick-Cube from Pixels Only

SumeetBatra commented 3 weeks ago

Hey! I wanted to see if you guys had any reference code / hyperparameters for SAC solving any of the tabletop tasks using RGB(D) data only and no proprioceptive state information. Thanks!

StoneT2000 commented 3 weeks ago

Sorry we have not tuned SAC at the moment, only PPO with some proprioception data + one RGB camera. There is some example code with state based SAC, a simple vision based one will come eventually. TD-MPC2 is already integrated and supports learning from pixels, does need much tuning.

If there's a lot of value in testing algorithms with visual only inputs we can try and help set it up in the future, we have some DM control environments benchmarked with PPO with an option to use visual only inputs.

SumeetBatra commented 3 weeks ago

I see, thanks for letting me know! I think having some baselines of end-to-end pixel to action policies would be useful. I am currently using SAC for my project but may also try out other algos in the future.

StoneT2000 commented 3 weeks ago

Is GPU parallelization important in your case? Or are you working more on e.g. sample-efficiency. I can have some members on the team look to try and tune a RGB/RGBD SAC version.

SumeetBatra commented 3 weeks ago

It's not important, but if it makes policy convergence faster I'm for GPU parallelization. Sample efficiency is not an issue atm. I appreciate you all looking into this!

SumeetBatra commented 1 week ago

Hey! Just wanted to check in and see if this is in the pipeline and if so, if you guys have an expected release date on it. Thanks!

StoneT2000 commented 1 week ago

Currently working on it! Fixing up the SAC state and RGBD implementations now. will provide baseline for PickCube and maybe a few other tasks

StoneT2000 commented 1 week ago

Ok @SumeetBatra new baseline uploaded. I only checked it works for PushCube and PickCube from pixels. the suggested script to run

python sac_rgbd.py --env_id="PickCube-v1" --obs_mode="rgb" \
  --num_envs=32 --utd=0.5 --buffer_size=300_000 \
  --control-mode="pd_ee_delta_pos" --camera_width=64 --camera_height=64 \
  --total_timesteps=1_000_000 --eval_freq=10_000

was tested and converged after about 1-1.5 hours on a 4090. The SAC code can run faster if I add torch compile/cudagraphs support and add some shared memory optimization for observation storage but that will be done in the future.

https://github.com/user-attachments/assets/f04be6f4-1f7f-4519-9247-a22d3156a880

tiny 64x64 image in each corner is what the policy sees. Policy also sees any relevant state information (like goal position for the cube and agent joint positions).

See the SAC baseline readme: https://github.com/haosulab/ManiSkill/blob/main/examples/baselines/sac/README.md

I'm sure the other tasks work fine with just the same hyperparameters as PickCube training if trained long enough and appropriate controller is used.

SumeetBatra commented 3 days ago

@StoneT2000 Thank you so much!! I'll take a look and follow up if I have any questions.

haosulab / ManiSkill

[Question] Solving Pick-Cube from Pixels Only #667