projectEndings / staticSearch

A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection
https://endings.uvic.ca/staticSearch/docs/index.html
Mozilla Public License 2.0
46 stars 21 forks source link

Add option for sorting results by filter values #86

Open joeytakeda opened 3 years ago

joeytakeda commented 3 years ago

Following discussion from our meeting on Oct 23, this feature request is for providing an option to sort results by a particular filter value.

Here's one implementation I'm thinking about:

(I say <filterOptions> since that makes it a bit more extensible if need be)

<!--Display the name like a human value, but sort by surname, forename -->
<meta name="Name" value="John Keats" data-ssFilterValueSortKey="keatsjohn"/>

Or we could make it an object, with a key for each doc id:

docs: {
"JapaneseNightingale5.html": 9,
"LoveOfAzalea3.html": 15,
// Etc
}

Or we could just do all of the sorting during the creation of the array itself, so that we always know the proper (ascending?) order of the ids for that filter.

<option value="ssDesc2">Genre</option>
<meta name="docTitle"
 class="staticSearch.docTitle" content="Chapter 2" data-ssFilterValueSortKey="chapter02"/>

<!--Another document-->

<meta name="docTitle"
 class="staticSearch.docTitle" content="Chapter 20" data-ssFilterValueSortKey="chapter20"/>
martindholmes commented 3 years ago

We already have working sort keys for desc filters -- see e.g. the Despatches on Jenkins, where there are meta tags like this:

<meta name="Archive record" class="staticSearch.desc" content="CO 6:25" data-ssFilterSortKey="COAA_0006_0025"/>

which are used to sort checkboxes when making the search page, and which already wind up in the JSON:

"ssDesc4_100":{"name":"CO 6:25","sortKey":"COAA_0006_0025","docs":["V585AD11.html","V585AD12.html","V585AD10.html","V585AD14.html","V585AD09.html","V585AD04_A.html","V585AD01_A.html","V585AD08.html","V585FO02_A.html","V585AD13.html"]}

Can we simply repurpose this mechanism? In other words, if you add a sort key to your meta tags, we just use it in this way? The only configuration required then would be to specify a preferred order for the use of these sort keys; so you might specify (for Despatches, for instance) that you want to use only the date filter for sorting, or that you want to use document type first, and then date. If you specify a series of filters but don't provide any sort keys, we would just do our best to generate them based on the filter type.

joeytakeda commented 3 years ago

I wasn't thinking clearly: of course we should just re-use the same mechanism (I had it in my mind that there was some reason that we couldn't).

That all sounds good to me. So the idea here is still to create the new config option of <filterOptions> which would specify the order of sort options, right?

Just to take stock:

Should we also make these reversible? I.e.

Title (Ascending) Title (Descending)

martindholmes commented 3 years ago

I think we should output a pseudo-table-header at the top of the search results, with a column for each filter that's configured for sorting; that control should first sort ascending, then on second click descending. The default sort initially would be ascending based on the configured sequence of filters in combination, but after that, the user would be in control.

The big problem is cases where (say) date is the sortable filter, but some documents have no date; or a boolean filter is chosen, but many documents have no value for it. What would we do then?

martindholmes commented 3 years ago

@joeytakeda First, I think we should merge this ticket and issue #51, since they address the same thing. Second, in the real-life project that's actually in urgent need of this (The Colonial Despatches), I'm discovering that a single filter or even a combination of filters is not what the people want; they want a single, predictable sort order that covers all documents, whatever filters they may or may not have. And in any case, you may have a search engine with no filters at all, but you still have a need to sort all the documents which have the same relevance score.

Therefore I've come back to the idea that we should just implement a single sort key for each document, configured in a meta tag. That gives the site dev complete control over sorting, and obviates the need for extra controls (= clutter, for most people) in the search interface itself. Since we give you all the results without paging, any user can Control + F to find things among the results, so I think a single pre-configured sort key makes the most sense.

<meta name="sortKey"
 class="staticSearch.sortKey" content="bio_douglas_james"/>
<meta name="sortKey"
 class="staticSearch.sortKey" content="despatch_1848-02-07"/>

What do you think?

joeytakeda commented 3 years ago

I think your solution above is a great one for #51, though I’m not sure that this issue should be merged with #51, since this is a feature request for new functionality while #51, I think, describes what the default functionality should be.

The scenario that I have for wanting to be able to sort by filter is when you’re looking for a term and you want to find the earliest instance of that phrase; we have this on the Winnifred Eaton Archive already, but that, of course, was added on top of static search since we needed it for the project to help answer questions about the shape of her career.

That said, that’s not necessarily something we need to have for the majority of projects. So, if that’s the case (which I think is reasonable), then I’m happy to close this ticket in preference for having the simpler and more elegant sort key solution for sorting documents/breaking ties. Any additional sorting would be something one has to implement themselves (including figuring out the best controls for it, etc, since people may want a select menu or a table header or something else entirely).

martindholmes commented 3 years ago

Point taken about the two different scenarios. I'll go ahead with implementing a sort key, then, but we can leave this one open so that the more sophisticated option is also on the table.

martindholmes commented 3 years ago

The docSortKey is now working, and it does give me a notion of how we might implement this one without much pain. When the SSResultSet object compiles its results, it builds in a sort key (if there is one) derived from the ssTitles JSON file. But that sort key might be passed to the SSResultSet object some other way; one option would be that when a search is run, a drop-down list might be created at the top of the search results with one item for each filter (active filters, or all filters? Not sure.) If you select one of those items, the whole search is re-run (it'll be very quick, since the required JSONs are already retrieved and in the in-memory index), but this time, any filter values from the selected filter are passed to the SSResultSet in lieu of the default docSortKey. One unknown, though, is how you actually sort on the basis of a filter value. For a boolean filter, do all the Trues come first, followed by the Falses? Where do documents without a value for a particular boolean sort -- beginning or end? When a single file has multiple instances of the same desc filter (it's Born Digital and it's Documentation, or it's Published and Peer-reviewed), which one should be used? Is alphabetical order for desc filters meaningful in any way? Since we don't have any good answers to these questions, I'm going to punt this to the Blue sky milestone, and also suggest that until we have a genuine requirement from a real project for something like this, we don't bother attempting to implement it. The only project with a result-sorting requirement (other than by relevance score) that I have is the Despatches, and that was well-served by the docSortKey; I've also added that to the Graves project. It's possible that a single universal sort key is all that any project will ever need, and anything more complicated may just confuse users.

martindholmes commented 2 years ago

Revisiting this after the 1.4 release, and issue #217, I think one very simple thing we could do would be to add a configuration option that says whether you want your results to be sorted by score THEN by the document sort key, or ONLY by the document sort key. Then SSResultSet could be passed that config, and SSResultSet~sortByScoreDesc would perhaps be renamed sortResults and follow the config rule.

joeytakeda commented 2 years ago

I'm not totally keen on fixing the sort order in the config—in my mind, document sorting is really something that ought to happen client-side; it may be the case that you want to look for the earliest use of a term, then go back and try to find the document with the most uses of that term.

That said, I agree that the problems you noted on Dec. 23—i.e. how can we determine the logic for the sorts depending on various filters—are really tricky and make it difficult to come up with a good and robust solution that could work across many projects with many different kinds of filters. Plus, if we do make client-side sorting possible, then we also have to deal with query strings, interface additions, captions, et cetera—all of which aren't necessarily reasons not to do this, but just considerations to add here to outline the amount of work it would take.

So perhaps one option is that we do what you propose, but keep in mind that we want to make this as extensible/flexible as possible since we may want to add user-controllable sorting to mix in the feature

martindholmes commented 2 years ago

We already have user-configured document sorting with the sort key; the question is only whether that sort key should take precedence over the order provided by scoring or not. So it's just a boolean config: do I want my sort key to take precedence or do I want score to take precedence. I think if both exist, they'll always both be used, it's just that they'll be prioritized in different orders. For anything more sophisticated still, we would probably just point users towards overriding the sort function itself.

joeytakeda commented 2 years ago

I meant more the end-user using the search in the project, not necessarily the person who is configuring it — but I agree that the simplest for now is just adding a config value.

I wonder if we should resolve #195 first to get the structure of the new config down, and then add the new config option so that we can figure the best place for it?

martindholmes commented 2 years ago

Ah, I see; sorry, I was just really thinking about the defaults configurable by the project. We've had two specific projects (the one from issue #217 and ColDesp) which wanted to prioritize dates because of the nature of the archive, but I do also see the value in our earlier blue-sky notions about having results as a sortable table. The problem with that is that we allow huge numbers of results, but there's a limit to table rows in HTML pages as we know; also, if you display only a subset of results, but the user then chooses to sort in such a way that the new order would include results not seen in the currently-visible subset, then there's an algorithmic dilemma to solve. I agree, though: let's get the config format nailed down first, then think about the config-controlled sort option, and only after that the end-user searcher case.

martindholmes commented 1 year ago

Action plan:

Config allows the sortable attribute on the filter.

ResultSet should decorate the results it generates with the in-scope sort keys, one attribute for each filter. This needs to be in the mapDocs structure; a re-sort should start from that structure and rebuild the HTML. The URI needs to reflect the active sort filter.

On the page: A select element for each filter ascending and descending. If you use this control, any doc sort key is then ignored, as is the score.

The sort parameter needs to be added into the URL, which means it needs to be harvested from the URL, and applied to the page and to the initial search if there is one. So this control is actually a kind of search filter, and must be persistent across searches unless explicitly cleared.

martindholmes commented 11 months ago

Discussion today: We have decided that:

  1. The sort control will be a drop-down list whose default selection is Score Descending.
  2. In the case of filter labels which have rich HTML content, we simply plain-text it. This is not ideal, but a select element is clearly the best control for this specific purpose.
  3. The control will be just like a regular form control; changing it will not cause anything to change until you press Search.

This means that form handling, url parsing, and search triggering are much simpler, and there is no ambiguity about what will happen if you change this control.

joeytakeda commented 11 months ago

JT will implement the preliminary <select> menu in the form (but not the JS at this part)—probably with a @value for each option that is filterId_(asc|desc)

joeytakeda commented 1 month ago

Notes from June 7 2024 discussion. @martindholmes has started a branch for this (https://github.com/projectEndings/staticSearch/compare/iss-86-sorting)