Currently, to calculate calibration, we discretize the outputted probability density onto a grid that tiles the arena floor. But the way we currently label gerbils who rear up at the edges of the arena leads to labels potentially far outside the bounds we discretize on. This leads to situations like the following, where the model is potentially outputting something reasonable but it isn't counted by the calibration calculation because of the bounds of the arena.
Currently, to calculate calibration, we discretize the outputted probability density onto a grid that tiles the arena floor. But the way we currently label gerbils who rear up at the edges of the arena leads to labels potentially far outside the bounds we discretize on. This leads to situations like the following, where the model is potentially outputting something reasonable but it isn't counted by the calibration calculation because of the bounds of the arena.