Closed blackmad closed 3 years ago
I'd prefer to avoid adding another field if possible since it adds complexity to the schema & query logic.
The disadvantage of adding more tokens to the existing country_a
field is that it will affect the scoring, for example USA/US
or NZD/NZ
will match twice on the alpha2 tokens since they share a common prefix.
[edit] the above is not correct for keyword
type fields.
Since that field doesn't really need, or greatly benefit from, the TF/IDF type scoring we can probably just disable scoring completely for that one field using either a custom similarity
or more simply, using index-options?
Yeah, I didn't realize when I threw out the "two fields" idea that we were already using a keyword
field for country_a
. It should have the behavior we want even with multiple ISO code variations indexed into a single field. 🤞
On the topic of using multiple fields to mitigate 'double scoring' (for future discussions), I'm not convinced that adding a second field can actually mitigate the issue.
We can use a multi_match query to search against both fields for the same term but I think cross_fields
and also potentially best_fields
will still incur a term frequency scoring penalty.
We can wrap a subquery in a constant_score but that would work equally well for both a single field as it would for two fields.
Okay I can yank all these PRs down and go back to trying to add synonyms to the field in the index
On Mon, Nov 2, 2020 at 9:26 PM Peter Johnson notifications@github.com wrote:
@missinglink commented on this pull request.
In synonyms/custom_admin.txt https://github.com/pelias/schema/pull/469#discussion_r516396744:
@@ -23,3 +23,4 @@
foo => foo bar, baz
#
=============================================================================
+GB => UK
This should be "GB,UK" (with a comma) as per synonyms/country_a2/country_a2.txt.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pelias/schema/pull/469#pullrequestreview-522156274, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADMZMG4TJJU5OPWUY7P4LTSN5S4NANCNFSM4S7PAR7A .
-- David Blackman creative technologist & wandering help me find my purpose http://purpose.blackmad.com
@missinglink why do you say it's indexed as type keyword?
In our index at least, it looks like type=text
country_a: { type: "text", similarity: "peliasDefaultSimilarity", fields: { ngram: { type: "text", similarity: "peliasDefaultSimilarity", analyzer: "peliasIndexOneEdgeGram", search_analyzer: "peliasAdmin" } }, analyzer: "peliasAdmin" },
@blackmad looks like you're right:
@missinglink maybe we should look at changing that? I don't think we've changed the mappings for the *_a
fields in quite some time. Maybe we can do better now?
That is so weird, I remember looking through the code and now looking through it again you're right, maybe I'm losing my marbles!
we are all. covidbrain.
On Wed, Nov 18, 2020 at 8:07 PM Peter Johnson notifications@github.com wrote:
That is so weird, I remember looking through the code and now looking through it again you're right, maybe I'm losing my marbles!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pelias/schema/pull/469#issuecomment-730056635, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADMZMAQMOZV7N2SY6XGBQTSQRVWRANCNFSM4S7PAR7A .
-- David Blackman creative technologist & wandering help me find my purpose http://purpose.blackmad.com
currently we only index iso3 codes in country_a, which is a problem in autocomplete for countries where the iso2 code isn't a prefix of the iso3, like China (CHN/CN) or Mexico (MEX/MX) meaning the query "beijing cn" fails in autocomplete.
It's not a super common real world case for users (since they should hit the result by earlier) but does come up a fair bit in testing and looks embarasssing.
iso2 queries seem to work fine in geocoding, I think because slop lets us ignore them?
This will be paired with a change to pelias-model to automatically add iso2s.
Open to other approaches.