Closed attardi closed 4 years ago
Sorry, the requirements.txt should read:
transformers == 2.10.0
Sorry, the requirements.txt should read:
transformers == 2.10.0
Sorry, the code does not support transformers with version 2.2 or higher right now.
Since I employ a word-wise tokenization for BERT input, and BertTokenizer.encode
adds special tokens like [PAD]
and [SEP]
to each tokenized unit by default since 2.2.
This can lead to some unexpected behavior.
These are formatting errors. Shall I close this and resubmit?
It does not work as expected.
kmeans() sometimes produces large clusters, which cause to run out of CUDA memory when computing the embeddings.
The change is just in file parser/util/alg.py.
The other files contain unrelated changes, to allow using ELECTRA models or other from Huggingface.