medialab / halexp

medialab's expert search engine poc
GNU General Public License v3.0
4 stars 0 forks source link

Filter too short phrases ? #27

Closed boogheta closed 6 months ago

boogheta commented 7 months ago

When testing with no query, we can see there are matches such as ".", "PI.", "Rev.", "Prob.", "(cons.", etc.

I'm wondering whether we should trash before embedding them all too short phrases with less than 8 characters for instance.

cc @jimenaRL

jimenaRL commented 6 months ago

Done with new parameter in corpus config "min_num_characters"