microsoft / IBAC-SNI

Code to reproduce the NeurIPS 2019 paper "Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck" by Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin and Katja Hofmann.
https://arxiv.org/abs/1910.12911
Other
52 stars 17 forks source link

Diverging results from those in the paper #5

Closed mahenning closed 4 years ago

mahenning commented 4 years ago

Hi, I ran the commands for IBAC, IBAC-SNI and NoReg from the Readme two times for 100 million steps, and I got the results below. Do you have an idea why (a) my return for IBAC is so much higher than in the paper and (b) the return for IBAC-SNI is so low?

I know that training for only two times isn't much but at least for the two IBAC variants the variance was almost non-existent. Sincerely, Marko

maximilianigl commented 4 years ago

Hi Marko,

unfortunately, the fully observable Multiroom environment I'm using here is extremely noisy and the results are bi-modal, i.e. either it learns or it does not (at all). In the paper I'm therefore averaging over 30 seeds and plotting the standard error, i.e. the std/sqrt(30). With just two random seeds, I'd say it's quite possible that you just were lucky for IBAC and unlucky for IBAC-SNI. When I was working on it, I was using about 5-8 seeds and plotting the median & max. It's still very noisy, but I found that to be a decent intermediate solution for faster iteration speed and somewhat meaningful results. Best, Max

maximilianigl commented 4 years ago

I'm closing this for now, but please let me know if you have further questions.