Closed mcdurdin closed 4 years ago
Also: review feedback from @ermshiperete at https://community.software.sil.org/t/looking-for-testers-keyboard-search-refresh/3570/8
TODO:
[x] Remove "Go to keyman.com" link (Fix keymanapp/keyman.com#153, keymanapp/keyman#3463)
[x] Change "Home" button to "New Search" (Fix keymanapp/keyman.com#153, keymanapp/keyman#3463)
[x] Loading /keyboards does a redirect to q=&page=1
which is kinda unnecessary! (Will be fixed in #107)
p:*
)
The results are across all platforms. Is it possible to filter for the platform? (Fix: keymanapp/keyman.com#152)[x] The search currently only finds languages that start with the search term. Previously it also listed languages that contained that term. Searching for "German" now shows all keyboards for the German language, but not "German, Pennsylvania" that it showed previously. Searching for "Pennsylvania Dutch" shows the expected results, but searching just for "Dutch" shows only keyboards for Dutch, but not Pennsylvania Dutch.
[x] if search finds a match I can't extend the search to include a space - the space at the end gets removed. I can paste from the clipboard, then it searches with space (e.g. "german pe") and finds "EuroLatin (SIL) (German, Pennsylvania language)", but I can't type it. (FIX: keymanapp/keyman.com#149)
[x] Searching for "Amish" (that previously showed language "German, Pennsylvania (Amish Pennsylvania German)") has 0 results.
[x] It's not possible to search for language code (unless you use l:id:code)
[x] Searching for "usa" shows keyboards for Usakade language, but not keyboards used in the USA.
[x] Searching for "c:id:usa" has 0 results (don't they use keyboards anymore? :-) ). Ah, I see. It expects the two-letter codes from ISO 3166-2, not the three letter codes: "c:id:us"
[x] Searching for "c:usa" has 0 results. Has to use "c:united states" (and if I'm slower to type than the search to find results then I can't type that because it strips the space)
[x] Searching for "swa" shows keyboards for several languages that start with "swa". However, it doesn't show "EuroLatin (SIL)" for Swabian; Swabian keyboards appear under obsolete keyboards. Searching for "swab" shows the EuroLatin one as well.
[x] The display of results when searching for localized language name is awkward: "EuroLatin (SIL)(Deutsch language)". Putting the language first would be better: "EuroLatin (SIL)(language: Deutsch)"
[x] Searching for "l:id:ydd" shows results for BCP 47 tag 'yi' - which might be correct but is a bit surprising. "l:id:yi" shows same results (old search didn't find anything for l:id:yi). Searching for "l:id:yih" shows results for BCP tag 'yih'.
[x] Searching for "yiddis" shows "Yiddish Pasekh". Searching for "yiddish" shows "Yiddish Pasekh (Yiddish language)". Searching for "yiddish p" shows "Yiddish Pasekh" again.
[x] The old search listed languages and countries related to the search term which I find helpful.
Just found a bug:
[x] When searching for ipa, why does IPATotal show up first (with 82 monthly downloads) and IPA (SIL) only second? Especially since I did the search on Linux where IPATotal is not supported. I would have expected IPA (SIL) to show up first. (Will be fixed in keymanapp/keyman.com#37)
[x] Also, why does “Tepehuan (obsolete non-Unicode)” show up third instead of under “Obsolete keyboards”? (Fix: keymanapp/keyman.com#151)
I was trying to query for a recently added sil_nko keyboard for the N’Ko language
My query l:n’ko gives a list of 7 Keyboards for languages matching ‘n’ko’ but sil_nko isn’t one of them.
The keyboard page https://keyman-staging.com/keyboards/sil_nko
does list N’Ko (l:id:nqo) as one of the supported languages
- The search currently only finds languages that start with the search term. Previously it also listed languages that contained that term. Searching for "German" now shows all keyboards for the German language, but not "German, Pennsylvania" that it showed previously. Searching for "Pennsylvania Dutch" shows the expected results, but searching just for "Dutch" shows only keyboards for Dutch, but not Pennsylvania Dutch.
This is by design. There is only one keyboard currently listed that supports those languages: sil_euro_latin
. However, because it also supports Dutch and German, searching for those terms finds the shorter matching language names first. Because we don't do a nested search now, just a keyboard search, these types of changes in results are to be expected.
- Searching for "Amish" (that previously showed language "German, Pennsylvania (Amish Pennsylvania German)") has 0 results.
langtags.json
does not list Amish Pennsylvania German as an alternate language name for Pennsylvania German. If this is a problem, it should be fixed in langtags.json.
- It's not possible to search for language code (unless you use
l:id:_code_
)
This is an advanced feature and is by design. There is now a hint to help you search by language code on the search page.
Searching for
usa
shows keyboards for Usakade language, but not keyboards used in the USA.Searching for
c:id:usa
has 0 results (don't they use keyboards anymore? :-) ). Ah, I see. It expects the two-letter codes from ISO 3166-2, not the three letter codes: "c:id:us
"Searching for
c:usa
has 0 results. Has to usec:united states
(and if I'm slower to type than the search to find results then I can't type that because it strips the space)
Correct. We don't currently support synonyms or abbreviations for country names. This would be a low priority feature I think; I don't want to maintain a database of synonyms for countries and the ISO 3166-1 list does not include them. We use the ISO 3166-1 alpha-1 list, which is the most common format.
- Searching for "swa" shows keyboards for several languages that start with "swa". However, it doesn't show "EuroLatin (SIL)" for Swabian; Swabian keyboards appear under obsolete keyboards. Searching for "swab" shows the EuroLatin one as well.
This is by design. "EuroLatin (SIL)" matches on "Swati" language rather than "Swabian", and the keyboard won't be shown twice. There are 13 different language names starting with "swa" in langtags.json and we don't want to show duplicates. Just keep typing if it hasn't found the name you are looking for 😉.
- The display of results when searching for localized language name is awkward: "EuroLatin (SIL)(Deutsch language)". Putting the language first would be better: "EuroLatin (SIL)(language: Deutsch)"
I think this is mostly personal preference 😁.
- Searching for
l:id:ydd
shows results for BCP 47 tagyi
- which might be correct but is a bit surprising.l:id:yi
shows same results (old search didn't find anything forl:id:yi
). Searching forl:id:yih
shows results for BCP tagyih
.
This is correct. We normalise the BCP 47 language subtag from ISO639-3 to ISO639-1 (which gives us ydd
->yi
). yih
does not have an ISO639-1 code.
- Searching for
yiddis
shows "Yiddish Pasekh". Searching foryiddish
shows "Yiddish Pasekh (Yiddish language)". Searching foryiddish p
shows "Yiddish Pasekh" again.
This is a side-effect of the precise match signal, which pushes the exact string match of Yiddish
language name into a higher weight. I don't think I'll try and improve it 😄.
- The old search listed languages and countries related to the search term which I find helpful.
I also, in some ways, prefer the nested search results... But this was the trade-off I made at the start of the design. The old search had too much complexity due to the multiple search result lists and I think that this simpler flat search result matches what most users are going to expect (as they will be familiar with the flat Google-style searches).
- Searching for “German” shows “EuroLatin (SIL) (German language)” as expected. However, the bcp47 tag in the link is wrong: pdt instead of de (https://staging-keyman-com.azurewebsites.net/keyboards/sil_euro_latin?bcp47=pdt)
This has been resolved in an earlier PR.
- When searching for ipa, why does IPATotal show up first (with 82 monthly downloads) and IPA (SIL) only second? Especially since I did the search on Linux where IPATotal is not supported. I would have expected IPA (SIL) to show up first.
Okay, so this is actually a bit of a tricky one.
For the embedded search, IPATotal would not show up. For the basic web search, we don't use the current user's platform as a signal, currently. The unexpected ordering here comes about because we are multiplying the match weight against the ln() of the download count (+2 for reasons).
IPATotal currently wins out because its name starts with IPA as well as having IPA in the description, giving it a basic weight of 60 vs SIL IPA of 35.
The final weights are 286.24 and 225.48 respectively. We just need to download sil_ipa another 3000 times a month and it'll sort itself out 🙈. Perhaps that indicates that ln()
is a little too strong. Maybe sqrt()
is a better curve, making popularity a stronger signal?
And with sqrt()
, we end up with final weights of 652 and 877 approx, respectively, so SIL IPA would win. But does this hurt other searches? What are our other options?
Changing this formula will break all my tests because all the weights change so I am really not very keen 🤣... but will do if this is a good solution. Thoughts appreciated.
Finish keyboard install page (aka universal link infrastructure) for:
[x] macOS (Will be fixed in #109)
[x] Android (Will be fixed in #109)
[x] iOS (Will be fixed in #109)
[x] Query: can sentry events be sent from KEyman Configuration web ux now?
I think the staging site is using the BCP 47 tag und-fonipa
for the sil_ipa keyboard, but the keyboard package metadata is using und-latn
.
On Keyman for Android alpha, I do a keyboard search for "sil_ipa" and install the keyboard. The sil_ipa keyboard shows up with the tag und-Latn
. From the app, I then do a keyboard search for l:id:und-latn
and I get 0 results. (Shouldn't it have found sil_ipa?)
Re und-fonipa
and und-latn
: this arises from a disconnect between the sil_ipa.keyboard_info
and sil_ipa.kps language data:
"languages": ["und-fonipa"],
<Languages>
<Language ID="und-Latn">und-Latn</Language>
</Languages>
This was deliberate at the time, because we had trouble installing und-fonipa
on some platforms. This will be resolved when we go to 14.0 release, so we should plan to update the SIL IPA keyboard to use und-fonipa
in sil_ipa.kps
as well.
All remaining items extracted into separate issues, so closing this mega checklist
FUTURE:
TODO:
[x] Include hints on how to use search (advanced search link?) -- extrapolated from Doug's feedback: (FIX: keymanapp/keyman.com#149)
The new site gives 27 keyboards in response to fulfulde. The old gives only 1, and it is not a good one.
However, the new site returns no results for the code ffm whereas the old site gives one language, under which there are 4 keyboards.
[x] Update database layer to use schemas for switching instead of separate databases. (FIX: #96, #97)
[x] Separate deprecated keyboards out of default search: https://community.software.sil.org/t/looking-for-testers-keyboard-search-refresh/3570/3?u=marc_keyman (FIX: #99, keymanapp/keyman.com#147)
[x] strip out generic words such as “language” and “keyboard”. https://community.software.sil.org/t/looking-for-testers-keyboard-search-refresh/3570/5 (FIX: keymanapp/keyman.com#148)
[x] skip the “page 1 of 0” when there are no results! https://community.software.sil.org/t/looking-for-testers-keyboard-search-refresh/3570/5 (FIX: keymanapp/keyman.com#146)
[x] http://api.keyman.com.local/search?q=l:id:str-latn is matching nothing. Searching for
str-latn
returnsstr
as expected, but losing thestr-latn
which is needed for the keyboard itself. So there is something mismatching here. (via slack https://keymanapp.slack.com/archives/C6Q9WS09G/p1593670613153900) (FIX: #98)[x] Use https://keyman.com/go/package/download/ pattern for package download APIs on api.keyman.com. (https://docs.google.com/document/d/1rhgMeJlCdXCi6ohPb_CuyZd0PZMoSzMqGpv1A8cMFHY/edit#heading=h.mio0p3wdzdye)
[x] Embedded search is a bit wasteful on space. Let's restructure it (e.g. Home link, keyman.com link could be together on right, with Keyboard Search and search box on left, all on one line) (Fix keymanapp/keyman.com#153, keymanapp/keyman#3463)
[x] Uncaught TypeError: Cannot read property 'q' of undefined at init (search.js:282) at load_search (search.js:311) when using in embedded search; I searched for "khmer" and then clicked the first link. URL: http://keyman.com.local/keyboards/khmer_angkor?bcp47=km&embed=windows (Fix: #150)
[x] Setup internal links for back-end API calls to use internal.api.keyman.com (non-proxied form of api.keyman.com); these avoid trips outside the datacenter for PHP calls to the backend. Requires also DNS setup. (Will be fixed in #107)
[x] Deleting all text from search box doesn't reset search (FIX: keymanapp/keyman.com#149)
[x] Verify that all appropriate indexes are in place (especially check short searches such as 'r', 'ra'). (Will be fixed in #107)
[x] Non-Unicode keyboards are not listed as 'obsolete' yet (FIX: keymanapp/keyman.com#151)
===========================================================================================================================
DONE:
Additional Notes: Notes on api.keyman.com changes for langtag consumption
example keyboard: burushaski_girminas, khw-latn "Khowar (Latin)"
ខ្មែរ finds zero results but ខ្មែ finds 7...
Show a 'popular keyboards' list for the empty search -- this can also be the search engine jumping-off point.
"Show obsolete keyboards" needs an indication of the change of status ("Hide obsolete keyboards") and needs to be outdented. Also needs some thought with paginated results.
Too many pages leads to overwhelming number of page links at bottom (e.g. s:latin)
http://api.keyman.com.local/search?q=l gives 500
el_dinka appears to have non-canonical bcp47 codes -- search finds it no trouble.
Show list of associated languages+scripts+countries in keyboard deatils (and related keyboards?)
For in-app download links, include information on searched language code (if available), for default language install (#1456)
Match fields in json should be integer or float where possible, not string! (and update schema accordingly)
schema for match type should be restrictive to actual types used
Search "spa" vs "spanish" -- the weighting could be better. Similar "ger" vs "german". (probably need length-based match weight override)
REFACTOR: region vs country
REFACTOR: code vs id vs tag
Pagination
Need to give more detail on failed links (and make it easier to find in logs, so tweak the broken link search a node wrapper)
Searches for keyboard ids should work
Phrases are not working yet (need to split into either a phrase search or separate words)
Searches for bcp47 tags, scripts, regions should work
FAIL: http://api.keyman.com.local/search/2.0?f=1&q=l:%, c:%, etc.
Default search should return a FLAT LIST of KEYBOARDS ONLY with highlights. e.g. 'Thai' should return keyboards with 'Thai' in the name, in a language name, or in the country associated with the language.
Search results must be weighted (summed?) a) match of primary language name 1.0 b) match of alternate language name 0.3 c) match of keyboard name or id 1.0 d) match of script name 1.0 e) match of country name 0.5 f) match on term in description 0.5 g) match quality (whole word match = 1.0, down to 0.1 for further distance? as a multiplicand)
select * from t_langtag_name inner join containstable(t_langtag_name, name, 'isabout (thai weight (1.0), "thai*" weight (0.5))') as KEY_TBL ON t_langtag_name._id = KEY_TBL.[KEY] order by [RaNK] desc
5 / 5 = 1.0 4 / 5 = 0.8 1 / 5 = 0.2 NOTE: final weighting is different but ... let's see how it goesCan also specify a search:
?q=l:<term>
search for keyboards that support a language, by name (does not check id)?q=l:id:<id>
search for keyboards that support a language, by bcp 47 id?q=c:<term>
search for keyboards that support languages within a country?q=c::id:<id>
search for keyboards that support languages within a country, by iso 3166 id?q=s:<term>
search for keyboards that support a script?q=s:id:<id>
search for keyboards that support a script by script id?q=id:<id>
search for keyboards that match the id?q=legacy:<id>
search for keyboards that match the legacy id, only one returned!Should be able to specify alternate names? Searches should match on NFKD with diacritics stripped.
when searching for http://api.keyman.com.local/search?q=k:thai
Process: