projectEndings / staticSearch

A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection
https://endings.uvic.ca/staticSearch/docs/index.html
Mozilla Public License 2.0
51 stars 22 forks source link

Large desc filters should be rendered in a more user-friendly way #84

Closed martindholmes closed 3 years ago

martindholmes commented 3 years ago

Following email discussion with @joeytakeda, we've arrived at this outline for a possible better way to handle desc filters with large numbers of items:

Instead of generating a filter control with thousands of checkboxes, which is impractical, there should be a config-specified threshold with a reasonable default (100?) where, if the number of filter values exceeds this value, the control should be constructed differently on the search page. Instead of a checkbox list, there should initially be just a text box into which you type a partial or complete string representing what you're looking for. (Perhaps this should allow wildcards.)

When the number of matching values for your typed string gets below a certain level -- say less than six items in the full set of desc filter values matches -- then checkboxes for those items should automatically be created and added to the control. Then you may type a different string and get another set of matching items to add to the list. When you're happy with your constructed subset of checkboxes, you can proceed to use them as in a standard desc checkbox set control.

As soon as you get some checkboxes, you also get a button which will delete them if you press it, clearing the control so you can start again.

martindholmes commented 3 years ago

I'm pushing this to 1.2 because I don't believe we have any current projects which need it, and because it's partly a user-implementation issue; should there be desc filters with thousands of distinct values, really?

martindholmes commented 3 years ago

Pushing this forward again to 1.3. We will need it for some projects down the line, so it won't get lost in the shuffle.

martindholmes commented 3 years ago

I have a working demo of this control which we're discussing now. It looks feasible to implement this for 1.3.

martindholmes commented 3 years ago

@joeytakeda We made a vague decision that a type-ahead control would simply be a sort of allotrope of the desc filter, which would be triggered to take that form when the number of items options exceeds a certain threshold. However, I'm now revisiting that decision; I'm beginning to think it should be a specific control that you choose rather than having it thrust upon you. There are a number of reasons for this:

So I think it would make more sense just to treat this as a new sort of control. We can certainly warn people during the build process if a desc control is getting so big that it might be better as a type-ahead, but I think we should leave the choice with the user, and avoid yet another configuration item for the threshold. What do you think?

joeytakeda commented 3 years ago

Agreed--I think you're right that this should be configured as a specific control and not something in the configuration. But since these are still desc filters (and the typeahead would only ever be on a desc filter), I would be inclined to say the typeahead feature is a modifier of the desc filter, rather than a separate thing. So what I'm thinking is that the typeahead feature is triggered by adding a staticSearch.descTypeahead class to your existing staticSearch.desc—you must have both in order for it to work. So, something like:

<meta name="Canadian cities" content="Vancouver" class="staticSearch.desc staticSearch.descTypeahead"/>

This seems a bit cleaner to me since the typeahead isn't a new thing (it's still an ssDesc) and thus shouldn't get its own standalone class. The only instance where that could be a problem, I suppose, is if you did want a typeahead version and a non-typeahead version of the same filter, but I don't think that's really something that should happen—in fact, I would say that it is an error if you have a typeahead desc and a non-typeahead desc with the same name.

martindholmes commented 3 years ago

It seems simpler just to have:

<meta name="Canadian cities" content="Vancouver" class="staticSearch.descTypeahead"/>

don't you think?

joeytakeda commented 3 years ago

But I think that would mean promoting it to an equivalent status as a desc, bool, etc, when it’s not—it’s a desc that has a typeahead feature, not a different class of filter entirely.

martindholmes commented 3 years ago

Hmm. In that case how about:

<meta name="Canadian cities" content="Vancouver" class="staticSearch.desc.typeAhead"/>

One thing nagging in the back of my mind is that this control might in the future become something a bit more distinct from a desc, but I'm not sure how yet.

joeytakeda commented 3 years ago

I don't really see the harm of having two separate class tokens (but maybe that's because I'm becoming used to these sorts of conventions for CSS, like BEM). Keeping them separate might mean slightly easier implementation (i.e. the .ssDesc selector (or an xpath contains-token(@class,'staticSearch.desc')) would still retrieve all of the ssDescs, regardless of it being a typeahead), but that's only helpful if we're fairly sure that the typeahead is just a ssDesc re-skinned and not something else entirely.

One thing that still needs to be determined is what happens if there's two identically named filters, one that's a regular desc and one that's a typeahead one:

<meta name="Canadian cities" content="Vancouver" class="staticSearch.desc {{WHATEVER TYPEAHEAD CLASS WE DECIDE}}"/>
<meta name="Canadian cities" content="Vancouver" class="staticSearch.desc"/>

Should that be two separate filter controls (I can't see why that should happen) or an error?

martindholmes commented 3 years ago

My background is all object-oriented, so I prefer something that looks more like classes and sub-classes. That's also in the back of my mind when it comes to the JS part of the story; while this control has to be handled completely differently initially -- it needs a wrapper class that will bind the map of name/value pairs to it -- once it's set up, it can be treated the same as a regular desc (or at least that seems to me to be the case, and is the underlying idea).

From the point of view of CSS styling, there's really no difference in one sense, since we have the ^= selector; and in fact, if you need to distinguish between this and a desc control, you might end up writing staticSearch.desc.typeAhead. But this does bring up for me the issue of using periods in class names; it's not ideal really, because of the class-chaining syntax in selectors. I wonder if we shouldn't be doing that?

joeytakeda commented 3 years ago

I think we're basically thinking the same thing: that the typeahead is a subclass of the ssDesc. But I think the way to flag that sort of relationship is to have space separated tokens; requiring both ssDesc and ssDescTypeahead as space separated tokens in the @class would mean that the ssDescTypeahead would inherit all of the processing for ssDesc (in the XSLT and likely in the JS, as far as I can tell), which could then be overridden if necessary.

But yes, we shouldn't use periods in the class names—that'll be a difficult, breaking change to make (but one that's probably worth doing).

martindholmes commented 3 years ago

I think the XSLT will be different -- instead of creating checkbox controls, it'll just be creating a single text box -- and I think a second JSON file might be needed, or perhaps a tweak to the current one. The ids of items will have to be mapped to their strings somehow.

With 1.2 we did pilot a pretty decent deprecation method, which we could also follow to fix the period issue by supporting both in the next version, but warning when period values are found. I do think it's worth doing. Since there are no bugs (AFAIK) in 1.2, it's fine to leave older projects pinned to that version if it's not practical for them to change their source HTML. Since we're not in a rush for 1.3, we can take our time to think clearly about how we want this stuff to work. My instinct is either that this is a completely new control that you consciously choose to use, in which case it's implemented separately and code-reuse is implemented pragmatically, or it's a special case of the desc filter, in which case it should perhaps be triggered by a configurable threshold of distinct values. I prefer the former, though.

martindholmes commented 3 years ago

I've been thinking a little more about this, and it seems to me that there's one more factor that counts in favour of this being a completely separate control from the desc filter.

A desc filter is so called because its most common use-case is to categorize documents within a collection. This is what it's primarily used for in MoEML, LEMDO, Keats, Mariage, and several other projects. However, the most likely use of this new control is not going to be like that. For one thing, it's not likely that projects will have dozens or hundreds of document types; and even if they do, it's not likely that users will be able to predict their nomenclature sufficiently to be able to use a typeahead interface. Rather, I think these controls will be used for such things as "People mentioned in the documents" or "People with occupation X" (thinking particularly of VIHistory, but also the Despatches, of course, for vessels and placenames, and MoEML). So this is not likely to be a descriptor filter; instead it's a sort of content filter.

If this is the case, then it will be easier to explain and document it as a separate sort of thing, with specific use-cases distinct from the desc filter -- half-way between a filter and a text search, in a sense.

joeytakeda commented 3 years ago

That all seems right to me. I agree that making this completely separate from ssDesc is a good idea—it’s another thing to handle, of course, but I think it’s much clearer conceptually, as you pointed out.

How about calling it an “entity”?

martindholmes commented 3 years ago

Entity works for people, places etc., but I don't know that it works for things such as occupation, nationality, ethnicity and so on, which may end up being handled by this filter too. I wonder if we should just call it typeAhead?

joeytakeda commented 3 years ago

I don't love calling it typeahead just because it seems quite different from the rest of the classes, which all describe the type of thing it is, rather than how it will be rendered/handled. But I can't think of anything else, really...

martindholmes commented 3 years ago

Understood. If we're going by the type of thing it is, then it's a string value -- staticSearch.string?

joeytakeda commented 3 years ago

String is OK, but it doesn't really seem that different from "desc"....some other suggestions (per thesaurus): item, entry, record, listing (or list?)? I kind of like listing or item, since it does give a sense of how that's going to be rendered without it necessarily being defined by the typeahead mechanism.

martindholmes commented 3 years ago

I don't like listing because what we're not doing is giving the searcher a list to choose from. Item suggests only one thing somehow. More thoughts:

detail, specific, particular, feature, factor, constituent, facet, property

joeytakeda commented 3 years ago

I like feature or property, but lean towards feature since property has a lot of baggage already. Plus, “feature” seems to align well with the list of potential use cases you outline above.

martindholmes commented 3 years ago

OK, staticSearch.feature. Do we implement this, or do we go through the process of eliminating the periods first?

joeytakeda commented 3 years ago

Let's eliminate the periods first; it will be good to have that done on the base set of filters etc, then we can move forward with the new filter with the new syntax. I'll open a ticket for the period removal/deprecation.

martindholmes commented 3 years ago

This will be staticSearch_feature, since the period notation was switched.

martindholmes commented 3 years ago

Starting preliminary work on this in branch issue-84-feature. First of all I'll add some sample metatags to the test set.

martindholmes commented 3 years ago

I'm going for staticSearch_feat and ssFeat to keep things shorter.

martindholmes commented 3 years ago

@joeytakeda I've done everything up to the JavaScript in the branch, and I'm about to start in on the JS now. There's one thing I think we need to figure out before I start, though. When the page first loads, the typeahead control can have no useful functionality because we don't yet have the file JSON to handle it. That means a user could start typing and nothing would happen. I think we need to handle that one way or another. These I think are the options:

What do you think? Any other approaches I haven't thought of?

joeytakeda commented 3 years ago

Thanks for all this! I think the second option is best—the features are likely going to be quite big, so I’d prefer if they were fetched rather than potentially slow down the page. I think the simplest thing would be disabling the input (ie by using disabled and styling the input cursor using the :disabled pseudo selector) and then removing the disabled attribute once the feature is GOT.

This might also be a good opportunity to test out the preload headers per #148. Conceivably, by the time static search kicks into gear, the ssFeat might already be in the cache.

martindholmes commented 3 years ago

Good idea. I'm skeptical on the preload headers, but this will actually make it easy to test if we just use that one header. I'm going to implement this in MoEML and Despatches for testing once it's basically working, and the latter especially has lots of filters, so we should see the effect of preloading this one file quite clearly.

martindholmes commented 3 years ago

I've done the bulk of the work on this, with the typeahead stuff working nicely and the filters being used in searches. The remaining thorny issue relates to timing and sequencing of events in the initial setup. A page may load initially with a query stored in the URL, through a link or bookmark. Previously we could simply initiate that query as soon as the search object was set up, because the url params could be used to configure the search form immediately, and the resulting query could be launched even before the filter JSON required for it was actually available. However, now we need to retrieve the feature filter json before we can even configure the feature filter itself -- we need the filter data to construct any checkboxes the query needs -- so there is an additional constraint on when the query can be launched. I think this should probably be the sequence:

When parsing the URL, build a temporary array of the names of filters that are included in the query. Set a queryHasStarted flag to false.

When attempting to run the query, check that the JSON for all those particular filters has been retrieved already. If it hasn't, then bail for the moment. If it has, then set queryHasStarted to true, and run it.

Whenever a JSON file for a feature filter is retrieved and processed, check whether all of the filters in that array are now complete. If they are, and queryHasRun is still false, run the query.

I think that should do it, but there may be things I haven't thought of.

martindholmes commented 3 years ago

What I hadn't thought of was a simpler approach using await. I could make URL querystring parsing method asynchronous, and simply await retrieval of the JSON for any feature filters included in the query. I'm going to start out with that approach, and backtrack if I find it's problematic. The problem as always is the possible collisions between retrieving the JSON as part of the normal page-setting-up code, and retrieving the same JSON more urgently when it's needed for a search-on-load.

martindholmes commented 3 years ago

OK, I think I have this working now. Testing the branch with the Despatches project.

martindholmes commented 3 years ago

The initial implementation is OK with smaller sets, but way too slow for larger ones. We need a debounce or similar constraint to prevent excessive processing; we need the control to inherit the minimum term length from the StaticSearch object; and we need a faster way to search the term list. Should be straightforward.

martindholmes commented 3 years ago

All mitigations now implemented, and I believe we have a functional setup. I'll issue a pull request unless @joeytakeda sees anything amiss with the Despatches example site in the next few hours. I'll do documentation after merging into dev, and then we can do a triage and see what else is outstanding for 1.4.

martindholmes commented 3 years ago

One bug reported by KSS: trying to click on the scrollbar of the dropdown dismisses the dropdown. This is a bug in how my blur code is working; it looks like the scrollbar isn't "part of" the dropdown itself, so clicking on it is clicking "away". This needs some investigation, but shouldn't be too hard. So no pull request yet.

martindholmes commented 3 years ago

I've worked around that bug. It was weird one: a scrollbar doesn't seem to be part of the element that it's scrolling. Instead, it appears to be part of the parent element. So if you try to detect a click outside a specific element (in this case a scrolling menu), in order to close the menu, clicking on the menu's scrollbar will close it. Anyway, after more testing with Despatches, I should be able to do a PR.

martindholmes commented 3 years ago

This ticket can now be closed; the main work has landed in dev, documentation is done, and any further optimizations deserve their own ticket.