zeroth-robotics / sim

Training in simulation
MIT License
5 stars 5 forks source link

Set up task system #28

Open henri123lemoine opened 5 days ago

henri123lemoine commented 5 days ago

The vision is essentially to have "tasks" that are defined by (1) their reward scales, (2) starting states/starting env, and (3) termination states. The environment then combines the tasks during training (presenting the robot with a random task (or smth more complicated)) in order to get one policy to be able to follow any of these tasks based on their commands.

The main point of this sort of system would be to make adding new tasks easy, and have reasonable defaults that make new tasks relatively likely to work. Generally, ~democratize "training a robot for a task". Another possible benefit is that a single policy that works on all these tasks might be easier to "finetune" for a new task faster. Unclear how well this would work for these network sizes.

E.g. of tasks:

*among other things

henri123lemoine commented 5 days ago

(One policy per task may be better, for outcome interpretability reasons)