Open WorksWellWithOthers opened 3 years ago
@WorksWellWithOthers This is indeed a form of reward engineering and is specific to CartPole to turn the returned state into a numeric reward. Other environments would not need this specifically, and potentially would return a distinct reward already.
This would break in environments that return the state as more/less than 4 values for unpacking.