Open raharth opened 3 years ago
We could simply use the upper x-sigma bound of the probability instead of the probability itself.
Or any other weighting of them, as re-normalizing them by their uncertainty.
This should lead to improved exploration of unknown states
We could simply use the upper x-sigma bound of the probability instead of the probability itself.
Or any other weighting of them, as re-normalizing them by their uncertainty.
This should lead to improved exploration of unknown states