Sorry about the late response.
The cost is basically a negative log sum of rewarded glimpse locations and classic classification likelihoods.
So the model not only learns to make better classification predictions, but also learns to make glimpse sequences that leads to better rewards. Usually, that means looking at relevant locations. For MNIST, that usually means looking at the white number instead of focusing on the black background.
Cost is not to be confused with accuracy, which would always be >= 0.
Sorry about the late response. The cost is basically a negative log sum of rewarded glimpse locations and classic classification likelihoods. So the model not only learns to make better classification predictions, but also learns to make glimpse sequences that leads to better rewards. Usually, that means looking at relevant locations. For MNIST, that usually means looking at the white number instead of focusing on the black background. Cost is not to be confused with accuracy, which would always be >= 0.