nilscrm / stackelberg-ml

0 stars 0 forks source link

Adversarial Setting #12

Open YanickZengaffinen opened 4 months ago

YanickZengaffinen commented 4 months ago

Mediocre performance on methods so far => could try adversarial setting of follower-leader.

nilscrm commented 4 months ago

What we ideally want to do is do encourage the policy to go to states where the model in inaccurate (which is adversarial). The problem with this is that we don't train the policy on models and not on the real environment. That means for a transitions in the model we don't really know how far away that is from the real environment (unless we "cheat" and just look at the actual transition probabilities which you would usually assume that you cannot look at).

YanickZengaffinen commented 4 months ago

Technically, sampling from the real environment during pretraining is not prohibited in Gerstgrassers framework. If we need to do it to estimate stuff about the real env and this in turn helps us speed up pretraining I don't see why we shouldn't do it.