I saw that we could train labeled dataset using your module. But I have huge corpus of unlabeled text data which are in sentence sequence representations. I just want to train language model kind of model on my data to learn about domain specific word or sentence representations interms of embeddings so than I can use those embddings for downstram unsupervised tasks. Do you have any idea how can I train bert pretrained model on my corpus. Thank you.
Hi. FastBert does not support language model fine tuning yet. But check out LM fine tuning in pytorch-transformers package. I believe it’s in the examples folder.
I saw that we could train labeled dataset using your module. But I have huge corpus of unlabeled text data which are in sentence sequence representations. I just want to train language model kind of model on my data to learn about domain specific word or sentence representations interms of embeddings so than I can use those embddings for downstram unsupervised tasks. Do you have any idea how can I train bert pretrained model on my corpus. Thank you.