Open valedica opened 2 years ago
The average embeddings can be wrongly calculated during inference due to a small bug in neuralcoref.pyx:
https://github.com/huggingface/neuralcoref/blob/60338df6f9b0a44a6728b442193b7c66653b0731/neuralcoref/neuralcoref.pyx#L896
PUNCTS is a list of strings, while token.lower is an integer hash. This means that punctuation embeddings will be added to the average embeddings of spans, causing a potential mismatch between training and inference features.
PUNCTS
token.lower
The average embeddings can be wrongly calculated during inference due to a small bug in neuralcoref.pyx:
https://github.com/huggingface/neuralcoref/blob/60338df6f9b0a44a6728b442193b7c66653b0731/neuralcoref/neuralcoref.pyx#L896
PUNCTS
is a list of strings, whiletoken.lower
is an integer hash. This means that punctuation embeddings will be added to the average embeddings of spans, causing a potential mismatch between training and inference features.