Closed georgiana-b closed 6 years ago
Unfortunately ODBs Lucene Index doesn't allow us to tweak with tokenizers and token filters so the resulting search lacks essential features like matching words with diacritics based on the plain letters version.
Can we add a "transliterated name" property to the objects and index both name, this one and the original one?
Relevant link: https://www.npmjs.com/package/transliteration
@nightsh Following your suggestion I added an extra field that contains the version of name without diacritics and made a composed index from both versions. Turns out this solves the diacritics issue and makes for a pretty good full text search so I'll go with it. Thank you for bringing this forward!
I need some assistance in deciding how to implement this feature @nightsh @zufanka
Unclear aspects:
cpvs
, years
, countries
) and ultimately to create the networkIf we want to use buyer ids and bidder ids as filters we will have to return Bidder
and Buyer
records as results to autocomplete.
Pros:
cpvs
, years
, etc.) will run faster since we don't have to do a full text search every timeCons:
Another option in to change the frontend functionality. Currently the workflow is: the user types in a query, we bring suggestions, the user chooses from those suggestions, we use those choices as filters further on.
Instead we could use the query as filter directly. So the workflow would be: the user types in a query, we bring suggestions of entities, the user reviews the suggestions and twitches the query accordingly, when they are happy with the results they move to the next filter and we use the query as a filter further on.
Pros:
IBM*
or S.A.~1
Cons:
first_company OR second_company
Considering one of our main selling points it to make digging into the data easy, the second option raises the data literacy bar too much so I went with the first.
@nightsh Do you use this feature somewhere else too? For example when people search for actors to cluster?
Provide suggestions for actors (bidders/buyers) from a bucket of tenders based on at least 2 letters.The bucket of tenders should also be filtered by cpvs, countries, years and other actor ids
The tokenizer we had for the ElasticSearch index in the previous backend works very well.
actor
can act both asbidder
andbuyer
in a network and people usually want to see both roles when they research a given actor, it might be better to refactor the query builder endpoints to useactors
directly instead ofbidders
andbuyers