Closed cdeepakroy closed 4 years ago
Hi @cdeepakroy, thanks for the suggestions! We're looking into a way to provide a better default logging level when training the label model. As far as optimization goes: we've found SGD with the provided LR schedulers to be a reliable approach in most cases. We've also seen some gain from using Adam in certain cases. Unfortunately, there's no one-size-fits all here, so we recommend trying out different settings (as it sounds like you're doing already!).
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Is your feature request related to a problem? Please describe.
I was working on a text classification problem and have been calling LabelModel.fit() with the default parameters. Since the fit() method does not print out the loss by default, I did not realize that training loss was not converging. I have been debugging the labels assigned by snorkel to some samples and during this process I wanted to look at loss curve. I dug deep into the fit method and noticed that logging of the loss is being done with
logging.info()
(see here and here).I had to import logging and set the level to info to get the losses printed. This showed me that loss has not yet converged and I had to increase the number of epochs and play with the learning rate a bit to get it converged.
I suggest printing the losses by default or (or add some verbose parameter to control it) with some reasonable default value for log frequency. That way the user will be able to see if the loss converged or not and take appropriate actions.
I would also suggest returning the loss history to the user in the form of an array to enable plotting of the loss curve. Looking at the loss curve gives a sense of whether the learning rate is low or high or just right.
Also, I noticed here that only four types of learning rate (LR) schedulers are supported --
["constant", "linear", "exponential", "step"]
. I would suggest exposing other schedulers intorch.optim.lr_scheduler
. Especially, a scheduler likeReduceLROnPlateau
which requires less tuning may be very useful here.I am curious if there has been any experiment comparing the performance of different LR schedulers and optimizers for fitting the LabelModel. And if so, what emerged as a recommended approach that worked in most cases.