Improving masking mechanisms in CNN and LightAttention models

Applying masking to CNN: Padded residue embeddings are now masked out which improves reproducibility and avoids different predictions between batches and single inputs
Improving masking in LightAttention model: Mask is now applied before the attention convolution, and -float('inf') is used instead of -1e9 which seems to improve reproducibility and avoids different predictions between batches and single inputs
Adding inference unit test to check if batch predictions match single predictions
Updating models and related inference unit tests

Closes #100.

sacdallago / biotrainer