hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

Get probability distribution over actions for discrete action space! #1047

Closed hiraphor closed 3 years ago

hiraphor commented 3 years ago

Hi, awesome library! My use case is to use RL for resource allocation. Therefore it's ideal if the input into the "step" method in the environment is a probability distribution which can correspond to the % of the resource being allocated, where it all adds to 1 (ie my action space is MultiDiscrete[a,b] and I'd like the probability over a and b input into my step method for learning). Is there a way to have the policy network output these probabilities as opposed to selecting an action?

Thanks!

Miffyli commented 3 years ago

See action_probability in documentation. This should be what you are looking for.

hiraphor commented 3 years ago

Action probability can't be called during training - can it? Ie I'd have to write my own learn method/function to incorporate it.

Miffyli commented 3 years ago

Ah, my bad! Now I seem to understand: Instead of discrete actions, you want the probabilities of each choice as an action. You could treat this as a continuous action space instead (spaces.Box), where for each choice you have a continuous number from minus one to one (symmetric spaces are nicer to agents), and inside the environment step function you normalize so that they sum to one. I can not guarantee that this will work out well as there is no specific distribution for your case (simplex).

hiraphor commented 3 years ago

No worries! That was the workaround I used in the end haha I guess I'll see if I can make it work. Thanks for the reply :)