I've checked, and I'm pretty sure we're doing the right thing, now.
In commit 1c2d8133483c8d5b88b755cb66d01ff966e1bde0, I added a mask for the TimeTransformer (we've been masking the SatelliteTransformer for a while, now).
We're now masking the attn input on a per-element basis (both for the SatelliteTransformer and the TimeTransformer), and we're masking the loss on a per-element basis.
I've checked, and I'm pretty sure we're doing the right thing, now.
In commit 1c2d8133483c8d5b88b755cb66d01ff966e1bde0, I added a mask for the
TimeTransformer
(we've been masking theSatelliteTransformer
for a while, now).We're now masking the attn input on a per-element basis (both for the
SatelliteTransformer
and theTimeTransformer
), and we're masking the loss on a per-element basis.