Closed AsmaBaccouche closed 1 year ago
Hi, the method we used here is applied on already pre-trained model during fine-tuning stage. I am not sure if it can be applied during pre-training stage. Maybe it could be better to train the models with architecture modified for longer texts like BigBird or Longformer if you want to pre-train it from scratch.
Is it possible to apply the same logic on Roberta for the MaskedLM model? I need it for pretraining on a custom dataset that has long texts - Thanks