Fix pretraining callback

The pretraining callback uses AM-softmax to make the embedding for consecutive frames as similar as possible. Currently, this also occurs across stochastic transitions (obtaining an apple in snake as the new one respawns in a random location), which is not desirable.

thebes2 / RL

Fix pretraining callback #11