Open erlefloch opened 7 months ago
yeah this is an ongoing issue we're hoping to look at soon. We used to use a tokenizer that stripped out punctuation and hyphens, but then we had complaints that search queries containing hyphens or other symbols were'nt found so we swtiched to a different tokenizer. So we need to look at using multiple tozenizers, if that's possible. It's a sympton of the solr configuration rather than anything in seek.
.... if you want to build your own solr container, I think you can just switch this and the line just below from WhitespaceTozenizer to StandardTokenizer.
https://github.com/FAIRdom/solr-seek-docker/blob/master/conf/schema.xml#L64
Thanks for the information ! :)
If I have a data file that is named "trials-wheat-2022" for example, it won't be found if I run the query "wheat" in the FAIRDOM search box. It would be nice to have hyphen-proof search, to make this kind of hyphenated titles appear in the query results.