Hello!
Opening a PR which unlocks fw's capability to account for different hyperparameters when loading from an initial model;
Current version freezes hyperparameters when loading from a pre-trained model (-i); should e.g., --learning_rate be present during model fine-tuning with new data, the defaults are always used. By adapting the persistence.rs to check the command line parameters and change the loaded model's hyperparameter values (only the ones specified as constants at the beginning of persistence.rs), hyperparameters can be varied between incremental updates in an online learning setting.
Tested by simulating a pretrained model and varying the --learning_rate, which should (and has) reflected in substantially smaller/larger changes of actual weight updates (it's directly related) - the prediction distributions also differ.
Further, applied default cargo fmt which seems to enforce generic formatting - to enforce this during dev, rustfmt.toml was also added as one of the configs - this can easily be reverted if necessary.
Hello! Opening a PR which unlocks
fw
's capability to account for different hyperparameters when loading from an initial model; Current version freezes hyperparameters when loading from a pre-trained model (-i
); should e.g.,--learning_rate
be present during model fine-tuning with new data, the defaults are always used. By adapting thepersistence.rs
to check the command line parameters and change the loaded model's hyperparameter values (only the ones specified as constants at the beginning ofpersistence.rs
), hyperparameters can be varied between incremental updates in an online learning setting.Tested by simulating a pretrained model and varying the
--learning_rate
, which should (and has) reflected in substantially smaller/larger changes of actual weight updates (it's directly related) - the prediction distributions also differ.Further, applied default
cargo fmt
which seems to enforce generic formatting - to enforce this during dev,rustfmt.toml
was also added as one of the configs - this can easily be reverted if necessary.