Closed sjunges closed 4 months ago
~This PR tracks bughunting for a bug~
Fixes the initial policy computation. The previous implementation did not correctly handle choices to states with infinite reward.
I hijacked this PR to push a fix to #555 (I wanted to keep the assertions introduced by @sjunges ).
Great
~This PR tracks bughunting for a bug~
Fixes the initial policy computation. The previous implementation did not correctly handle choices to states with infinite reward.