Write in future work section about cumulative propagation

How to detect the need to learn new fix utility model (FUM) versus new failure propagation model (FPM)? In the FPM, mRubis will return a negative reward for a wrong fix. How many unsuccessful fixes (or some negative cumulative reward) from the same Agent until that Agent should decide to learn a new FPM?

Concerning the FUM, how much discrepancy between the predicted and actual utility would warrant the need to learn a new utilitty model for a given Shop? There must be a trade-off between impact of prediction errors and the cost of training.

hpi-sam / Robust-Multi-Agent-Reinforcement-Learning-for-SAS

Write in future work section about cumulative propagation #20