mila-iqia / babyai

BabyAI platform. A testbed for training agents to understand and execute language commands.
BSD 3-Clause "New" or "Revised" License
700 stars 146 forks source link

How to measure RL sample efficiency #74

Open rickyloynd-microsoft opened 5 years ago

rickyloynd-microsoft commented 5 years ago

Hi,

My question relates to the RL results in Table 3 of the paper. I’m trying to use the iclr19 branch to generate at least 10 such results (for each level) to get stable mean and variance. The train_rl.py script seems to do almost everything required. But at the bottom of that file, after calculating the success rate over the (default of 512) episodes that were tested, the success rate is not actually logged. The mean return is logged instead.

Adding the following line (right after the calculation of success_rate) seems to log the missing number:

logger.info("Success rate {: .4f} reached after {} training episodes".format(success_rate, status['num_episodes']))

Also, it seems that the default save_interval of 1000 is too large for some of the easier levels. For instance, to get sufficiently frequent tests on GoToRedBallGrey, I call the script like this:

python scripts/train_rl.py --env BabyAI-GoToRedBallGrey-v0 --save-interval 10

Then to obtain the sample efficiency, I just look in the log for the first success rate to exceed 0.99, and take the number of training episodes up to that point. For seed=1, it happens on this line:

main: 2019-06-17 01:27:36,671: Success rate 0.9922 reached after 30769 training episodes

Is this the right way to generate more RL results like those in Table 3? Or is there an easier way?

Thank you for this excellent environment!

rizar commented 5 years ago

Thanks for your question. In fact, we are using .csv logs to compute when 99% success rate is reached. There is a PR underway that automates the process, you can try using the rl_dataeff.py script from it:

https://github.com/mila-iqia/babyai/pull/72/files#diff-96cf5347366f1902ca3259f6a094df51

I will back from vacation on Thursday and I will try to provide more help then.