Favoring uncertain states over certain ones when sampling actions

raharth / PyMatch

A collection of different PyTorch wrappers for training neural networks and reinforcement algorithms

MIT License

13 stars 2 forks source link

Open raharth opened 3 years ago

raharth commented 3 years ago

We could simply use the upper x-sigma bound of the probability instead of the probability itself.

Or any other weighting of them, as re-normalizing them by their uncertainty.

This should lead to improved exploration of unknown states