tarrade / proj_multilingual_text_classification

Explore multilingal text classification using embedding, bert and deep learning architecture
Apache License 2.0
5 stars 2 forks source link

max length with BERT: 512 or it was only in the original paper from Google ? #32

Closed tarrade closed 4 years ago

tarrade commented 4 years ago

best reference: https://github.com/google-research/bert/blob/master/README.md

https://github.com/google-research/bert/issues/190

There are no protections that said that such max length is not allowed but it failed:


InvalidArgumentError:  indices[0,992] = 992 is not in [0, 512)
     [[node tf_bert_classification/bert/embeddings/position_embeddings/embedding_lookup (defined at /site-packages/transformers/modeling_tf_bert.py:171) ]] [Op:__inference_distributed_function_1456028]

Errors may have originated from an input operation.
Input Source operations connected to node tf_bert_classification/bert/embeddings/position_embeddings/embedding_lookup:
 tf_bert_classification/bert/embeddings/strided_slice_2 (defined at /site-packages/transformers/modeling_tf_bert.py:165)

Function call stack:
distributed_function