luyug / COIL

NAACL2021 - COIL Contextualized Lexical Retriever
Apache License 2.0
148 stars 28 forks source link

Did you remove punctuations before computing the document score? #19

Closed namespace-Pt closed 1 year ago

namespace-Pt commented 2 years ago

ColBERT removed punctuations in document because they think they are useless. I wonder if you removed punctuations when computing overlapping tokens between query and document?

namespace-Pt commented 2 years ago

BTW, I think keeping the punctuations in both query and document would result in too long posting lists.

luyug commented 2 years ago

The current code does not introduce special treatments to punctuations.

With respect to the current evaluation query sets, the queries typically do not include punctuations and therefore having punctuations will have little empirical effect on scores/processing speed: their inverted lists are rarely traversed.