hpi-sam / Robust-Multi-Agent-Reinforcement-Learning-for-SAS

Research project on robust multi-agent reinforcement learning (marl) for self-adaptive systems (sas)
MIT License
0 stars 0 forks source link

Write in future work section about cumulative propagation #20

Open ulibath opened 2 years ago

christianadriano commented 2 years ago

How to detect the need to learn new fix utility model (FUM) versus new failure propagation model (FPM)? In the FPM, mRubis will return a negative reward for a wrong fix. How many unsuccessful fixes (or some negative cumulative reward) from the same Agent until that Agent should decide to learn a new FPM?

Concerning the FUM, how much discrepancy between the predicted and actual utility would warrant the need to learn a new utilitty model for a given Shop? There must be a trade-off between impact of prediction errors and the cost of training.