Fine tune language model on labeled data for downstream classification

ohmeow / blurr

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.

Apache License 2.0

289 stars 34 forks source link

In previous fastai sentence classification problems I've done the following approach, using a standard labeled dataset as df (like IMDB):

dblocks = DataBlock(blocks=(TextBlock.from_df('text', tok=tok, is_lm=True)),
                    get_x=ColReader('text'), 
                    splitter=ColSplitter())
dls = dblocks.dataloaders(df, bs=64)

# finetune language model
learn = language_model_from_pretrained(dls, url=url, drop_mult=1).to_fp16()
learn.lr_find()
lr = 3e-2
learn.fit_one_cycle(1, lr, moms=(0.8,0.7,0.8))
path = learn.save_lm('tmp/test_lm')
vocab = learn.dls.vocab

# train classifier
dblocks = DataBlock(blocks=(TextBlock.from_df('text', tok=tok, vocab=vocab), CategoryBlock),
                    get_x=ColReader('text'),
                    get_y=ColReader('label'), 
                    splitter=ColSplitter())
dls = dblocks.dataloaders(df, bs=128)
learn = text_classifier_from_lm(dls, path=path, metrics=[accuracy]).to_fp16()
learn.lr_find()
learn.fine_tune(5, 1e-2, moms=(0.8,0.7,0.8), wd=0.1)=

My understanding is that the first finetuning of the language model increases performance for downstream classification.

I don't see a clear implementation of this finetuning language modeling process on a pretrained model in your library. Is that because you tried it and it wasn't as useful with transformers as with AWD-LSTM? Are you aware of how one could accomplish this using DistillBERT, or any transformer compatible with your library?

Thanks!

If you want to go with training (or fine-tuning) a language model and then using it for downstream tasks, I have an example of building a LM with gpt-2 here you can look at. It shows how to fine-tune an LM that you can then use per above.

With Transformers, vs. ULMFiT, myself and most everyone I know hasn't seen any substantial improvement in fine-tuning the LM beforehand ... and so generally, I don't follow that approach. One of the reasons its valuable in ULMFiT is because with the tokenization strategy used you end up with a lot of tokens if you don't finetune the LM; this isn't a problem with transformer based LMs generally. In my experience, the trained transformer LMs found on the HF hub are good enough.

Still, if you got the time, you can give both a try and see if going the fastai approach gives you an edge ... it might. Each use case is different. Lmk if you do :)

ohmeow / blurr

Fine tune language model on labeled data for downstream classification #51