hpi-sam / Robust-Multi-Agent-Reinforcement-Learning-for-SAS

Research project on robust multi-agent reinforcement learning (marl) for self-adaptive systems (sas)
MIT License
0 stars 0 forks source link

summarize discussion of transfer learning between different agents #16

Open jocodeone opened 2 years ago

christianadriano commented 2 years ago

Transfer learning is necessary when there is some change in the dependencies between components that can affect the failure propagation, hence the probability of sucess of fix given an observation trace. Transfer would happen across agents, because with each agent, the shops would share the same models (HMM, Utility Prediction Model, and Policy). Initially, we envisage only perturbations on the HMM, for instance, change certain dependencies to force the agent to retrain the HMM and to communicate other agents about the change.

In future work, various of the features involved could be investigated. How to determined that perturbation warrants a retraining? How to determine which agents might be affected by a certain change in the given prediction model (that might depend on how the state-space is partitioned among agents)? How to partition the state-space (allocate agents to shops) in a way that the entire system is more robust to perturbations?

jocodeone commented 2 years ago

Currently, we are communicating a dependency change by the MultiAgentController that is asking the agents to help the underperforming agent. As soon as an agent is having a good result on this challenge, it will take over the underperforming shops. If no agent is found that is performing good (e.g. solving a challenge of fixing a failing shop (we're following this a the moment), or meeting the defined alarm definition), the agent will be retrained. What do you mean by communicating to other agents about the change?

In a future step, the retraining could also happen on a new agent that is initialized with the weights of the underperforming agent. In this way, we can retire the old agent and use this agent as a snapshot that could possibly be reactivated in the future (e.g. having a rollback of the systems hmm and this agent is the best possible solver for that). This retired agent could be asked to solve challenges for other agents as well if they have underperforming shops to identify if the old state is not underperforming anymore for a certain change of the system.