hpi-sam / Robust-Multi-Agent-Reinforcement-Learning-for-SAS

Research project on robust multi-agent reinforcement learning (marl) for self-adaptive systems (sas)
MIT License
0 stars 0 forks source link

Future Work - allow for non-stationary utility #66

Open christianadriano opened 2 years ago

christianadriano commented 2 years ago

Currently, the utility produced by mRubis is stochastic but it is stationary. In the future we might want to relax this assumption of stationarity.

jocodeone commented 2 years ago

As the agents are using the difference between the two steps' utility to calculate the reward, the change of the reward would have an impact on the learning. As the critic is used to predict the value of a certain state, the whole prediction could be wrong. All the weights have to be adapted (retrained) in order to have a correct prediction after a drift of the utility. Therefore I'm not sure whether the current implementation is capable of handling a change.

@christianadriano Where can I find information on how the previous project has solved this?