saadsharif / ttds-group

TTDS Group Project
3 stars 0 forks source link

Fix for stop word in phrase + faster stemming #13

Closed gingerwizard closed 2 years ago

gingerwizard commented 2 years ago

@enzo-inc fixes phrases with stop words

Adds print comments for index loading and speeds up indexing due to stemming by x3 with new lib

gingerwizard commented 2 years ago

This isn't ideal. I think we should index stop words and just give them a score of 0 at query time. This would ensure we match phrases accurately. Although with any stemming we only ever get rough phrase matching anyway.

enzo-inc commented 2 years ago

How much memory overhead would we have by indexing stopwords?

gingerwizard commented 2 years ago

Not a huge amount of memory - since really only an extra few hundred terms in the dict. Postings on disk could be large though - and this would consume memory when loaded at query time