The prioritized replay buffer is sampling just a set of experiences with very high probability over and over, in a window of samples. So far, the bellman errors seem ok, the modified agent pipeline also. The issue should be then with the priority buffer only. This might happen when assigning the priorities (on updates and additions)
Fixed by 9cddfe4: Modifies the SegmentedTree such that it does not use ints, but floats (it was really silly that the buffers were initialized to np.ndarrays with type int from the neutral element)
The prioritized replay buffer is sampling just a set of experiences with very high probability over and over, in a window of samples. So far, the bellman errors seem ok, the modified agent pipeline also. The issue should be then with the priority buffer only. This might happen when assigning the priorities (on updates and additions)