Open PaParaZz1 opened 2 years ago
Thank you very much for your feedback!
Unfortunately these days I am very busy and I cannot take care of it.
I did implement P-QLearning in my q-learning-algorithms in the past, I do not remember if it converged or the score.
Note: Algorithms are now using architectures that needs to know the which parameters are related to which action (e.g. MP-DQN). I think it may be better to change the way to handle the observation space. I am not completely sure yet what is the best way to do it. Even though it would definitely future-proof the repository, it would also break any agent that used this env... gym-platform is using one tuple of space per parameter-action pair, didn't test how inconvenient it is to have empty tuple (e.g. for breaking).
Hi, this is a nice project for hybrid action space, and I see you mentioned PDQN/HPPO in
README.md
. Do you have some experiment results about these algorithms in this environment. If not, we want to invite you to implement related algorithms and benchmarks in our repo DI-engine together, we will offer corresponding supports for you. Do you have will to construct a hybrid action space RL benchmark? Other comments are also welcome.