Open wwoast opened 5 years ago
The first post described the backing data to have better search UX. The front-end side of this, I want to look like the "assisted search" menu when you look for music or reviews on https://pitchfork.com.
This feature needs to wait on a better search parser, before it can be useful. https://github.com/wwoast/redpanda-lineage/issues/194
So the initial hack towards this idea is in https://github.com/wwoast/redpanda-lineage/commit/8801cf3cf824a2e3c64875df3c3e84ecc7639477, which created something called polyglots. Currently the only polyglot is the word baby, and that's a keyword
when followed by a numeric year -- but a tag
when followed by a panda name. It took a lot of hacky code to make this polyglot work, including a result set just for baby photos.
To clean up this code, I think actually what I want is the polyglot system to track entries in both the keyword
and tag
lists. But I don't see how to implement the scoring to say what interpretation of a polyglot should be chosen. I could track polyglots as a Parse
value, and give preferential scores to what a polyglot might be, given how the other terms in the parse tree were classified. The goal is to make decisions on searching when the input string that has potentially conflicting meanings for one or more terms. Examples:
Instead of a scoring system whose behavior is highly non-obvious, I could also alert to the user when it's unclear how a term should be processed. A good UX for this would be to eventually have a report of what each search parameter was classified as. When a term has more than one classification, RPF can provide an unordered-list-prompt where the user disambiguates what the terms represent.
When you type free text into the RPF search box, it returns whatever version of what you typed has the most hits. There are a ton of problems with this method (more as I think of them):
One way around this problem is to create an entity indexing system. Placenames, panda names, zoo names, and other searchable entity text would get classified with three pieces of data: an
entity type
(what it is), anentity priority
per type, and a hit count in the graph for that entity. All of these values can be built into an index at publish-time, and can be used to suggest either subsets of content, or alternate content to display.To illustrate how this might work, I would bump the panda
entity priority
such that exact matches for that type would take precedence over the partial string searches for locations or other things. There would be similar precedence bumps for numbers in a "year" date range, or for country matches outside of location strings.I suspect this entity index system may be required for good performance and UX feedback as to what the search is doing, once RPF has a proper parsing strategy for search queries.