thebes2 / RL

1 stars 0 forks source link

Enhance performance #13

Open thebes2 opened 2 years ago

thebes2 commented 2 years ago

Most standard RL algorithms look like they have issues with long horizon goals (i.e. they perform very well in the short term as every episode trains this, but not very well in the long term, as many episodes will terminate before reaching it). Instead of unrolling from the starting dist., try choosing an arbitrary state visited in the past and unroll from there. Will be difficult to implement for envs where we have no control over it (like gym) and will have to reweight this new experience so as to not bias the starting dist.