Autocomplete for actors

georgiana-b commented 6 years ago

Provide suggestions for actors (bidders/buyers) from a bucket of tenders based on at least 2 letters.The bucket of tenders should also be filtered by cpvs, countries, years and other actor ids

The tokenizer we had for the ElasticSearch index in the previous backend works very well.

[x] Investigate if we can replicate that text analysis using ODB's Lucene Index
[ ] If we can't get a similar effect set up an ElasticSearch index
[ ] Considering that an actor can act both as bidder and buyer in a network and people usually want to see both roles when they research a given actor, it might be better to refactor the query builder endpoints to use actors directly instead of bidders and buyers

georgiana-b commented 6 years ago

Unfortunately ODBs Lucene Index doesn't allow us to tweak with tokenizers and token filters so the resulting search lacks essential features like matching words with diacritics based on the plain letters version.

nightsh commented 6 years ago

Can we add a "transliterated name" property to the objects and index both name, this one and the original one?

Relevant link: https://www.npmjs.com/package/transliteration

georgiana-b commented 6 years ago

@nightsh Following your suggestion I added an extra field that contains the version of name without diacritics and made a composed index from both versions. Turns out this solves the diacritics issue and makes for a pretty good full text search so I'll go with it. Thank you for bringing this forward!

georgiana-b commented 6 years ago

I need some assistance in deciding how to implement this feature @nightsh @zufanka

Unclear aspects:

what suggestions should be returned when the users tries to filter by entities
how to record the user's choices to filter the other query builder options (cpvs, years, countries) and ultimately to create the network

georgiana-b commented 6 years ago

If we want to use buyer ids and bidder ids as filters we will have to return Bidder and Buyer records as results to autocomplete.

Pros:

filters applied on top of this (cpvs, years, etc.) will run faster since we don't have to do a full text search every time
selecting multiple entities that don't have a similar name is trivial

Cons:

An entity can act both as bidder and as buyer. We assume that when making actor centric networks our users want to see an entity in both roles. However because of tenders-exposed/elvis-backend-node#49 we have to keep 2 records for an entity, one for each role. Since we show all the bidders and all the buyers that match the name together, an actor that acts as both will appear twice and the user will actually have to select it twice in order to see it in both roles which is awkward.

georgiana-b commented 6 years ago

Another option in to change the frontend functionality. Currently the workflow is: the user types in a query, we bring suggestions, the user chooses from those suggestions, we use those choices as filters further on.

Instead we could use the query as filter directly. So the workflow would be: the user types in a query, we bring suggestions of entities, the user reviews the suggestions and twitches the query accordingly, when they are happy with the results they move to the next filter and we use the query as a filter further on.

Pros:

it is much easier to do bulk selects as requested in tenders-exposed/elvis-ember#326 especially for users who are familiar with "power" queries like IBM* or S.A.~1
we don't have to worry about merging and separating bidders and buyers. Since we don't need the user to specifically select each record we can show a name only once

Cons:

the filters on top will run a bit slower
it is no longer trivial to choose multiple companies that don't have the similar names because users have to use first_company OR second_company

georgiana-b commented 6 years ago

Considering one of our main selling points it to make digging into the data easy, the second option raises the data literacy bar too much so I went with the first.

georgiana-b commented 6 years ago

@nightsh Do you use this feature somewhere else too? For example when people search for actors to cluster?

tenders-exposed / elvis-backend-node

Autocomplete for actors #48