The pretraining callback uses AM-softmax to make the embedding for consecutive frames as similar as possible. Currently, this also occurs across stochastic transitions (obtaining an apple in snake as the new one respawns in a random location), which is not desirable.
The pretraining callback uses AM-softmax to make the embedding for consecutive frames as similar as possible. Currently, this also occurs across stochastic transitions (obtaining an apple in snake as the new one respawns in a random location), which is not desirable.