google / flaxformer

Apache License 2.0
321 stars 31 forks source link

BERT Pre-Training #1

Open stefan-it opened 2 years ago

stefan-it commented 2 years ago

Hi,

I would like to test this flaxformer library to pre-train a BERT from scratch.

What is necessary to create the pre-training data (mlm with duplication factor) on an own corpus with an own created wordpiece-based vocab.

How can the pre-training started.

I'm really excited to test it, any help is highly appreciated!