snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Enable other LR schedulers and expose loss history in LabelModel.fit() #1485

Closed cdeepakroy closed 4 years ago

cdeepakroy commented 5 years ago

Is your feature request related to a problem? Please describe.

I was working on a text classification problem and have been calling LabelModel.fit() with the default parameters. Since the fit() method does not print out the loss by default, I did not realize that training loss was not converging. I have been debugging the labels assigned by snorkel to some samples and during this process I wanted to look at loss curve. I dug deep into the fit method and noticed that logging of the loss is being done with logging.info() (see here and here).

I had to import logging and set the level to info to get the losses printed. This showed me that loss has not yet converged and I had to increase the number of epochs and play with the learning rate a bit to get it converged.

I suggest printing the losses by default or (or add some verbose parameter to control it) with some reasonable default value for log frequency. That way the user will be able to see if the loss converged or not and take appropriate actions.

I would also suggest returning the loss history to the user in the form of an array to enable plotting of the loss curve. Looking at the loss curve gives a sense of whether the learning rate is low or high or just right.

Also, I noticed here that only four types of learning rate (LR) schedulers are supported -- ["constant", "linear", "exponential", "step"]. I would suggest exposing other schedulers in torch.optim.lr_scheduler. Especially, a scheduler like ReduceLROnPlateau which requires less tuning may be very useful here.

I am curious if there has been any experiment comparing the performance of different LR schedulers and optimizers for fitting the LabelModel. And if so, what emerged as a recommended approach that worked in most cases.

henryre commented 5 years ago

Hi @cdeepakroy, thanks for the suggestions! We're looking into a way to provide a better default logging level when training the label model. As far as optimization goes: we've found SGD with the provided LR schedulers to be a reliable approach in most cases. We've also seen some gain from using Adam in certain cases. Unfortunately, there's no one-size-fits all here, so we recommend trying out different settings (as it sounds like you're doing already!).

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.