udacity / deep-reinforcement-learning

Repo for the Deep Reinforcement Learning Nanodegree program
https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893
MIT License
4.9k stars 2.34k forks source link

Use Pseudocount of Ones to Avoid Divide by Zero #37

Closed ekaakurniawan closed 2 years ago

ekaakurniawan commented 5 years ago

In Monte Carlo Solution Notebook and the assignment notebook, the count dictionary (N) uses default value of zeros. Since not all actions at certain state will get updated (especially in my case of using First-Visit MC Prediction), it is better to use default value of ones.

To replicate the issue:

$ python
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.array([2, 0]) / np.array([1, 0])
__main__:1: RuntimeWarning: invalid value encountered in true_divide
array([ 2., nan])

Thinking of replacing this:

N = defaultdict(lambda: np.zeros(env.action_space.n))

with this:

N = defaultdict(lambda: np.ones(env.action_space.n))

This implementation is from the cheatsheet. The textbook has no issue as it only mentions average(Returns()).