neuralmind-ai / portuguese-bert

Portuguese pre-trained BERT models
Other
792 stars 122 forks source link

What's max_len? #21

Closed moniquebm closed 3 years ago

moniquebm commented 4 years ago

In general, BERT-based models have a sequence maximum length = 512, but when I check:

tokenizer = AutoTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased') tokenizer.max_len is equal to 1000000000000000019884624838656

It seems you don't have maximum sequence limits. So, what's the right value to consider for max_len?

fabiocapsouza commented 4 years ago

Hi @moniquebm You're right, it should be 512.