Regression tests for model outputs

Something that would be really nice to have is regression testing for model outputs. In short, whenever we refactor something in models we want them to still be able to load checkpoints (or well, see #48 ) and give exactly the same output when being fed with the same data.

One way to achieve this could be to

Check out main branch
Run some example data through the model and save the predictions (potentially also some internal representation tensors but likely unneccesary and hard to do in practice)
Check out PR
Run the same example data through the model and compare outputs to saved predictions.

I'm not too familiar with pytest and the github workflows to know all the details of how to do this. @SimonKamuk, @leifdenby do you think something like this is doable? Or are there any better ways to achieve this?

mllam / neural-lam

Regression tests for model outputs #59