MaskedLM for longer texts

mim-solutions / bert_for_longer_texts

BERT classification model for processing texts longer than 512 tokens. Text is first divided into smaller chunks and after feeding them to BERT, intermediate results are pooled. The implementation allows fine-tuning.

Other

129 stars 30 forks source link

MaskedLM for longer texts #8

Closed AsmaBaccouche closed 1 year ago

AsmaBaccouche commented 2 years ago

Is it possible to apply the same logic on Roberta for the MaskedLM model? I need it for pretraining on a custom dataset that has long texts - Thanks

MichalBrzozowski91 commented 1 year ago

Hi, the method we used here is applied on already pre-trained model during fine-tuning stage. I am not sure if it can be applied during pre-training stage. Maybe it could be better to train the models with architecture modified for longer texts like BigBird or Longformer if you want to pre-train it from scratch.