Closed hiraphor closed 3 years ago
See action_probability
in documentation. This should be what you are looking for.
Action probability can't be called during training - can it? Ie I'd have to write my own learn method/function to incorporate it.
Ah, my bad! Now I seem to understand: Instead of discrete actions, you want the probabilities of each choice as an action. You could treat this as a continuous action space instead (spaces.Box
), where for each choice you have a continuous number from minus one to one (symmetric spaces are nicer to agents), and inside the environment step function you normalize so that they sum to one. I can not guarantee that this will work out well as there is no specific distribution for your case (simplex).
No worries! That was the workaround I used in the end haha I guess I'll see if I can make it work. Thanks for the reply :)
Hi, awesome library! My use case is to use RL for resource allocation. Therefore it's ideal if the input into the "step" method in the environment is a probability distribution which can correspond to the % of the resource being allocated, where it all adds to 1 (ie my action space is MultiDiscrete[a,b] and I'd like the probability over a and b input into my step method for learning). Is there a way to have the policy network output these probabilities as opposed to selecting an action?
Thanks!