Closed SumeetBatra closed 3 days ago
Sorry we have not tuned SAC at the moment, only PPO with some proprioception data + one RGB camera. There is some example code with state based SAC, a simple vision based one will come eventually. TD-MPC2 is already integrated and supports learning from pixels, does need much tuning.
If there's a lot of value in testing algorithms with visual only inputs we can try and help set it up in the future, we have some DM control environments benchmarked with PPO with an option to use visual only inputs.
I see, thanks for letting me know! I think having some baselines of end-to-end pixel to action policies would be useful. I am currently using SAC for my project but may also try out other algos in the future.
Is GPU parallelization important in your case? Or are you working more on e.g. sample-efficiency. I can have some members on the team look to try and tune a RGB/RGBD SAC version.
It's not important, but if it makes policy convergence faster I'm for GPU parallelization. Sample efficiency is not an issue atm. I appreciate you all looking into this!
Hey! Just wanted to check in and see if this is in the pipeline and if so, if you guys have an expected release date on it. Thanks!
Currently working on it! Fixing up the SAC state and RGBD implementations now. will provide baseline for PickCube and maybe a few other tasks
Ok @SumeetBatra new baseline uploaded. I only checked it works for PushCube and PickCube from pixels. the suggested script to run
python sac_rgbd.py --env_id="PickCube-v1" --obs_mode="rgb" \
--num_envs=32 --utd=0.5 --buffer_size=300_000 \
--control-mode="pd_ee_delta_pos" --camera_width=64 --camera_height=64 \
--total_timesteps=1_000_000 --eval_freq=10_000
was tested and converged after about 1-1.5 hours on a 4090. The SAC code can run faster if I add torch compile/cudagraphs support and add some shared memory optimization for observation storage but that will be done in the future.
https://github.com/user-attachments/assets/f04be6f4-1f7f-4519-9247-a22d3156a880
tiny 64x64 image in each corner is what the policy sees. Policy also sees any relevant state information (like goal position for the cube and agent joint positions).
See the SAC baseline readme: https://github.com/haosulab/ManiSkill/blob/main/examples/baselines/sac/README.md
I'm sure the other tasks work fine with just the same hyperparameters as PickCube training if trained long enough and appropriate controller is used.
@StoneT2000 Thank you so much!! I'll take a look and follow up if I have any questions.
Hey! I wanted to see if you guys had any reference code / hyperparameters for SAC solving any of the tabletop tasks using RGB(D) data only and no proprioceptive state information. Thanks!