Save (pickle) agent with highest reward rating

Save (pickle) agent with highest reward rating in addition to the regular one at the 'end of training'. Thereby we could investigate questions such as if the reward (even as mean over past episodes) can give a qualitative indication to the agent's performance. It might be of interest to see how the 'best agent' competes against the one from the (arbitrary) end of the training in a test environment (one without learning or exploration). This would probably be done in a qualitative relatively subjective way where we could try to examine complexity of strategies or the similarity to human-strategies (e.g. by playing it on our own with play.py)

laermannjan / nip-deeprl-project

Save (pickle) agent with highest reward rating #7