Diverging results from those in the paper

mahenning commented 4 years ago

Hi, I ran the commands for IBAC, IBAC-SNI and NoReg from the Readme two times for 100 million steps, and I got the results below. Do you have an idea why (a) my return for IBAC is so much higher than in the paper and (b) the return for IBAC-SNI is so low?

I know that training for only two times isn't much but at least for the two IBAC variants the variance was almost non-existent. Sincerely, Marko

maximilianigl commented 4 years ago

Hi Marko,

unfortunately, the fully observable Multiroom environment I'm using here is extremely noisy and the results are bi-modal, i.e. either it learns or it does not (at all). In the paper I'm therefore averaging over 30 seeds and plotting the standard error, i.e. the std/sqrt(30). With just two random seeds, I'd say it's quite possible that you just were lucky for IBAC and unlucky for IBAC-SNI. When I was working on it, I was using about 5-8 seeds and plotting the median & max. It's still very noisy, but I found that to be a decent intermediate solution for faster iteration speed and somewhat meaningful results. Best, Max

maximilianigl commented 4 years ago

I'm closing this for now, but please let me know if you have further questions.

microsoft / IBAC-SNI

Diverging results from those in the paper #5