Closed conorheins closed 1 year ago
Adressed this here, now closing
This has been better addressed and dealt with by @AleMuzzi in this pull request, dealing specifically with the case when there is a subset of the total number of actions that have equal probability, deterministic sampling still needs to sample from among them
When
action_selection == "determinstic"
incontrol.sample_action()
, but all the action probabilities are equal, we should sample rather than deterministically choose the first action (which is the default behavior ofnp.argmax