Closed connormeaton closed 3 years ago
If you want to go with training (or fine-tuning) a language model and then using it for downstream tasks, I have an example of building a LM with gpt-2 here you can look at. It shows how to fine-tune an LM that you can then use per above.
With Transformers, vs. ULMFiT, myself and most everyone I know hasn't seen any substantial improvement in fine-tuning the LM beforehand ... and so generally, I don't follow that approach. One of the reasons its valuable in ULMFiT is because with the tokenization strategy used you end up with a lot of
Still, if you got the time, you can give both a try and see if going the fastai approach gives you an edge ... it might. Each use case is different. Lmk if you do :)
In previous fastai sentence classification problems I've done the following approach, using a standard labeled dataset as
df
(like IMDB):My understanding is that the first finetuning of the language model increases performance for downstream classification.
I don't see a clear implementation of this finetuning language modeling process on a pretrained model in your library. Is that because you tried it and it wasn't as useful with transformers as with AWD-LSTM? Are you aware of how one could accomplish this using DistillBERT, or any transformer compatible with your library?
Thanks!