$ python
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.array([2, 0]) / np.array([1, 0])
__main__:1: RuntimeWarning: invalid value encountered in true_divide
array([ 2., nan])
Thinking of replacing this:
N = defaultdict(lambda: np.zeros(env.action_space.n))
with this:
N = defaultdict(lambda: np.ones(env.action_space.n))
This implementation is from the cheatsheet. The textbook has no issue as it only mentions average(Returns()).
In Monte Carlo Solution Notebook and the assignment notebook, the count dictionary (N) uses default value of zeros. Since not all actions at certain state will get updated (especially in my case of using First-Visit MC Prediction), it is better to use default value of ones.
To replicate the issue:
Thinking of replacing this:
with this:
This implementation is from the cheatsheet. The textbook has no issue as it only mentions
average(Returns())
.