ycba-cia / blacklight-collections2

5 stars 2 forks source link

Search fuzziness #204

Open edgartdata opened 4 years ago

edgartdata commented 4 years ago

@yulgit1 When I search for 'falcon' http://10.5.96.187:3000/catalog/tms:1193 does not appear although it has the subject tag of 'falcons (birds)'. I would think we would want the search mechanism to be inclusive? Or are we concerned about providing too many results if we err on the side of fuzziness? @flapka? others?

yulgit1 commented 4 years ago

@edgardata - to accommodate this I think indexing *_txt fields must use the PorterStemFilterFactory

as explained here:

https://stackoverflow.com/questions/38511261/solr-how-to-match-singular-and-plural-words

yulgit1 commented 4 years ago

reconfigured solr as : https://github.com/yulgit1/ycba-ansible1/commit/6b8cac9647b74f13a49944c879b4fa917baae51b

indexed one record : tms:1193, a search on 'falcon' should now find that record

full indexing this weekend

edgartdata commented 4 years ago

Yes it does! tms:1193 is the Study of Birds by an unknown 18th century artist below (second image from the left), which only has "falcons (birds)" as a subject tag in its record. fuzzy search results

edgartdata commented 4 years ago

All: let's individually test the search fuzziness next week (once the BL data is refreshed Saturday) and discuss results next Friday June 5.

edgartdata commented 4 years ago

boat -->boating?

flapka commented 4 years ago

(For post launch:)

Would it be possible to implement an exact search feature, i.e. to turn off automatic stemming with use of a syntax such as quotation marks?

If, for example, a researcher is interested in the concept billing, a search on that term will yield lots of false hits -- mostly records containing the name Bill. This will frustrate some users.