Applying masking to CNN: Padded residue embeddings are now masked out which improves reproducibility and avoids different predictions between batches and single inputs
Improving masking in LightAttention model: Mask is now applied before the attention convolution, and -float('inf') is used instead of -1e9 which seems to improve reproducibility and avoids different predictions between batches and single inputs
Adding inference unit test to check if batch predictions match single predictions
-float('inf')
is used instead of-1e9
which seems to improve reproducibility and avoids different predictions between batches and single inputsCloses #100.