Closed robeson1010 closed 6 years ago
@robeson1010 Hi sorry for late resposne.
To restart training you need to overwrite the set_model
function for example:
def set_model(self):
encoder = self.architecture_config['model_params']['encoder']
if encoder == 'from_scratch':
self.model = UNet(**self.architecture_config['model_params'])
else:
config = PRETRAINED_NETWORKS[encoder]
self.model = config['model'](**config['model_config'])
self._initialize_model_weights = lambda: None
self.load('YOUR_FILEPATH_TO_MODEL')
If you want to load the model that you pretrained that has one of those Resnet archs.
It is important to have self._initialize_weights
set to None
or else it would simply overwrite your loaded weights with random stuff.
When you restart it will start from epoch 0 (though your weights from epoch 54 will be used). I would suggest using a smaller lr if you were using some sort of decay. As of now we are not checkpointing the optimizer state so it will be difficult to restore the exact state of your training at epoch 54 but usually restarting with new optimizer gets the job done.
I hope this helps.
@jakubczakon Really thanks
@jakubczakon
"As of now we are not checkpointing the optimizer state so it will be difficult to restore the exact state of your training"
Is this still the case? I was hoping to run the training 5-10 epochs at a time and keep checking on the model's progress. Then I'd like to add some new classes, but that's a different problem. Basically I don't want to pay for the full 100 and then find out that something went wrong, or otherwise pay for 100 when 50 might suffice.
I have trained the data for 3 days but unfortunately the processing interrupted due to some reasons. I have used the 'python main.py -- train --pipeline_name unet_weighted' but it trained from epochs 0. How can I restore the training processing from my last time (54 epochs already)?