Closed Swk6666 closed 1 month ago
Both are feasible, in the sense that technically, you can write your hard-coded policy that would solve the task. However, stack has an highly spare reward, and flip requires a very precise control. Both are very challenging tasks, but possible.
Thank you for your reply. Are there any algorithms/hyperparameters that have already been successful, or do I need to implement them myself?
@Swk6666 I haven't tried PandaStack-v3, but I was able to solve PandaFlip-v3 with reasonable accuracy using the DDPG implementation in stable-baselines3. Appendix G in this paper lists the hyperparameters I used.
Hi. The environments PandaStack-v3 and PandaFlip-v3, whether there are feasible results for them, I saw in the paper that the success rate of the stack is almost 0.