Closed namespace-Pt closed 1 year ago
BTW, I think keeping the punctuations in both query and document would result in too long posting lists.
The current code does not introduce special treatments to punctuations.
With respect to the current evaluation query sets, the queries typically do not include punctuations and therefore having punctuations will have little empirical effect on scores/processing speed: their inverted lists are rarely traversed.
ColBERT removed punctuations in document because they think they are useless. I wonder if you removed punctuations when computing overlapping tokens between query and document?