Open jbloomAus opened 1 year ago
A better version of this might be write a script which takes the training data and tests the predictions of the RL policies vs the agent simulator. We can think closely investigate examples with significant divergence and investigate the underlying mechanisms.
https://docs.google.com/document/d/1N1lVOXS5bLKYiXfoEeQoxxtI_0EfROi-JXcs-eYTCSA/edit?usp=sharing
I think this could be very valuable form the perspective of measuring the agent-simulators proclivity for modelling different agents in it's training distribution.