hpi-sam / Robust-Multi-Agent-Reinforcement-Learning-for-SAS

Research project on robust multi-agent reinforcement learning (marl) for self-adaptive systems (sas)
MIT License
0 stars 0 forks source link

Future Work - How to consider uncertainty in the utility when ranking actions (probabilities if this component is correct) #43

Open ulibath opened 2 years ago

ulibath commented 2 years ago

The utility of a component and shop is stochastic, which means that there is a probability distribution that represents those utility values. This distribution is produced by a data generation process (DGP). Although this process is unknown to the agents, the DGP can be approximated by collecting data and fitting a predictive model. In order to be able to simulate the DGP, one has to select models that allow to be executed in "reverse", i.e., given the outcomes, provide the inputs. Bayesian models are capable of that \cite{gelman2013bayesian}.

@book{gelman2013bayesian, title={Bayesian data analysis}, author={Gelman, Andrew and Carlin, John B and Stern, Hal S and Rubin, Donald B}, year={2013}, publisher={Chapman and Hall/CRC} }