nilscrm / stackelberg-ml

0 stars 0 forks source link

How MBRL Approaches Fit Into Gerstgrasser Framework #8

Open YanickZengaffinen opened 6 months ago

YanickZengaffinen commented 6 months ago

Could be interesting to discuss if the Game Theoretic MBRL approach fits into the Gerstgrasser Framework (it's inner outer loop but would have to look closer at how querying is done)

Potentially could also have a quick look at other SOTA approaches and if they fit into Gerstgrasser

nilscrm commented 5 months ago

Actually I changed my mind about this and you might be correct with your hunch that the Game Theoretic MBRL approach might not fit into the Gerstgrasser Framework. I think all assumptions of Lemma 1 of the Gerstgrasser paper are satisfied and thus the the optimum of the learning problem is actually the Stackelberg equilibrium (that's what I was always referring to). However, even though the follower is implemented as a query oracle (I think you can see RL as a type of oracle implementation) we don't shows these queries (the learning process of the follower) to the leader. The paper states

If the follower oracle is implemented using RL, i.e., both leader and followers use RL, then the initial segment is simply one or more episodes of M where the followers are learning, and the final segment is one episode from M where the followers have converged.

Note the initial segments of episodes of M where the followers are learning. That means we need to show the trajectories that we use to train the follower also to the leader. So one trajectory of the leader is a lot of trajectories of the follower (with no rewards) and then one trajectory with rewards. This is not done by the implementation of the MBRL paper.

Since we don't include the training process we have an immediately-best-responding follower which can diverge.