thebes2 / RL

1 stars 0 forks source link

Fix pretraining callback #11

Closed thebes2 closed 2 years ago

thebes2 commented 2 years ago

The pretraining callback uses AM-softmax to make the embedding for consecutive frames as similar as possible. Currently, this also occurs across stochastic transitions (obtaining an apple in snake as the new one respawns in a random location), which is not desirable.