utterworks / fast-bert

Super easy library for BERT based NLP models
Apache License 2.0
1.87k stars 341 forks source link

How to train on unsupervised data only, to get domain specific embeddings representations #48

Open smalgireddy opened 5 years ago

smalgireddy commented 5 years ago

I saw that we could train labeled dataset using your module. But I have huge corpus of unlabeled text data which are in sentence sequence representations. I just want to train language model kind of model on my data to learn about domain specific word or sentence representations interms of embeddings so than I can use those embddings for downstram unsupervised tasks. Do you have any idea how can I train bert pretrained model on my corpus. Thank you.

kaushaltrivedi commented 5 years ago

Hi. FastBert does not support language model fine tuning yet. But check out LM fine tuning in pytorch-transformers package. I believe it’s in the examples folder.