Open Pytha11 opened 2 days ago
I never actually expected anyone to try the rocket landing environment, and I have only gotten it to work in some very rare edge cases.
So TLDR; it's actually not solved. I would be very happy if you want to take over designing a reward structure/sensor scheme for the environment. I can put you on the webpage when it is solved and also acknowledge your contributions in the paper when it is published.
I would like to contribute to the code, but I need more time to understand the physical dynamics. I think it is still a challenge. For example, I only allow the rocket to use booster ignition and throttle, like action = [0, 0, 0, 1, 1, 0, 0], but I got speed on x and y, like linear velocity = [0.3, 0.4, 0.5] instead of [0, 0, 0.5]. I think I cannot find the detail in your published paper.
By default, you will not be able to land the rocket with only vertical actuators (only throttle nd ignition) since the rocket starts in an arbitrary rotated orientation and there is also output noise. You can disable those here:
There may still be some level of noise in the system, but disabling those settings should make things much easier.
If you'd like to start tweaking the reward structure, it is defined here.
I have tried some RL algorithms, but all of them do not success to land. Do you have a successful case? Like, use PPO with some specific hyper-parameters.