Learning rate schedules and Mamba layer

Implement LR schedules in the PyTorch training code. Add MambaLayer and make it configurable through parameter config files and for HPO.

Also contains the following changes.

--load now takes as input the path to a saved checkpoint from which to start a new training. I.e., a training starts from epoch 1, with LR schedules starting from the beginning. The only thing that is loaded is the pre-trained model weights from the checkpoint.
--resume-training is a new command line arg that takes as input a path to a training directory containing an unfinished training and attempts to restore it and continue the training from the last saved checkpoint.

Screenshot 2023-12-06 at 13 28 12 Learning rate versus training step of a training using the cosinedecay LR schedule and was interrupted halfway through and then continued.

Screenshot 2023-12-06 at 13 27 41 Learning rate versus training step of a training using the onecycle LR schedule and was interrupted halfway through and then continued.

jpata / particleflow

Learning rate schedules and Mamba layer #282