Future Work - allow for non-stationary utility

hpi-sam / Robust-Multi-Agent-Reinforcement-Learning-for-SAS

Research project on robust multi-agent reinforcement learning (marl) for self-adaptive systems (sas)

MIT License

0 stars 0 forks source link

As the agents are using the difference between the two steps' utility to calculate the reward, the change of the reward would have an impact on the learning. As the critic is used to predict the value of a certain state, the whole prediction could be wrong. All the weights have to be adapted (retrained) in order to have a correct prediction after a drift of the utility. Therefore I'm not sure whether the current implementation is capable of handling a change.

@christianadriano Where can I find information on how the previous project has solved this?

hpi-sam / Robust-Multi-Agent-Reinforcement-Learning-for-SAS

Future Work - allow for non-stationary utility #66