sul-dlss / dlme

Digital Library of the Middle East web application, based on Spotlight
https://dlmenetwork.org/
Other
20 stars 2 forks source link

Items in browse categories & search results should be sortable by date #1348

Closed jacobthill closed 2 years ago

jacobthill commented 3 years ago

Add date as a sort option for search results and the browse category page.

Screen Shot 2021-11-08 at 11 46 31 AM
ggeisler commented 3 years ago

@jacobthill I know the desire to sort by date came up in the context of the interviewee looking at a browse category, but I'd suggest the application update we might want here is to add Date as a search results sort field. I believe we use the same sort field configuration for all search results, whether displayed on a browse category results page or a normal search results page.

In other words, I think a user who wants to sort items by date on a browse category results page would be just as likely to want to sort items by date when doing a normal search.

jacobthill commented 3 years ago

Thanks @ggeisler, that makes sense to me. I'll update this ticket.

jacobthill commented 3 years ago

Some potential challenges with implementing this feature are the limitations of the metadata. Some records will not have normalized dates, though the vast majority do. Others will have wide ranges (e.g. Cambridge records are 6th to 19th century). We will need to choose either the earliest or latest year to sort on. Probably makes sense to choose the earliest year. We will also need to provide some context to users about some of the features and the data quality. We should probably have an about page that gives enough context about how the metadata gets into DLME, some limitations, potentially confusing results from some features, and what to do when encountering errors. Overall, it seems like it will still be useful and could lead to better metadata in the long run. We will need to figure out how to manage user expectations given the poor date data.

corylown commented 2 years ago

Analysis and comments

Date information in DLME in Solr is stored in two different fields:

These are both solr.TrieIntField type multi-valued fields. I think for the purposes of sorting we can pick one since I would expect the sort order to be the same with either field. (If I am wrong then we may need to consider/configure both fields.)

Solr does provide the ability to supply a multi-valued field to the sort parameter, but the behavior is a bit quirky and there's an unfortunate bug. The DLME Solr schema does not have the sortMissingLast=true parameter attribute applied to the int type field so I will describe the behavior for the current configuration and then the behavior if we were to supply this attribute on the int type definition in the schema.

The default sort behavior on mulit-valued int fields is equivalent to explicitly sorting using the 2 argument field() function: sort=field(name,min) asc and sort=field(name,max) desc. So, when sorting in ascending order the minimum value is used and the when sorting in descending order the maximum value is used. This is somewhat confusing behavior.

With the TrieIntField (when sortMissingLast is not set) when relying on the default behavior the records with missing values sort FIRST when ASC and sort LAST when DESC. However, if the sortMissingLast param attribute is set to true then the records with missing values always sort last.

You are supposed to be able to explicitly pass the field function to the sort parameter to control whether the min or max value of the field is used for sorting but there is a bug in Solr where supplying the field() function to sort with the TrieIntField causes an error -- as described here: https://issues.apache.org/jira/browse/SOLR-12457.

All that said, it may end up making more sense to create a single valued field specifically for sorting by date that contains either the min or max value from one of the date range fields.

Decisions needed

The following concerns (from above) should probably be handled as separate issues once the date sort feature is implemented:

We will also need to provide some context to users about some of the features and the data quality. We should probably have an about page that gives enough context about how the metadata gets into DLME, some limitations, potentially confusing results from some features, and what to do when encountering errors.

jacobthill commented 2 years ago

@corylown thank you for this very detailed analysis. Its really helpful.

Definitely agree that we need to think about the contextual information we provide to users on the about page. I also wonder if there are things that we can do in the UI on this page to indicate that dates are sorted based on the earliest year and that records missing years will be at the end. @ggeisler I opened a new ticket and added the design_needed tag https://github.com/sul-dlss/dlme/issues/1470. In addition to that I will include this in our search tips page revisions.

ggeisler commented 2 years ago

@jacobthill I understand the ideas of displaying missing values at the end and using the min date value, but am unclear what the second part of this means:

I think we do want to use the min function for sorting; ascending order definitely makes more sense than descending order.

Are we only providing the user with a single sort option (Sort by Date (old to new)) or are we offering to sort dates in either direction (also providing Sort by Date (new to old))?

jacobthill commented 2 years ago

@ggeisler I assumed we were choosing one sort direction: Sort by Date (old to new). I am open to having both depending on the technical complexity and design considerations. @corylown please clarify if it will be easy to do both. I think we would have to pull two separate fields values, the earliest date for Sort by Date (old to new) and the latest date for Sort by Date (new to old). I also can't think of a use at the moment for the latter, though maybe people would want to sort some categories by more recent content.

corylown commented 2 years ago

@jacobthill I think it would be best to start from how we'd prefer date sort to work and the design we'd like and work back to technical implementation concerns and complexity. My assessment at the moment is that any solution is likely to involve some changes to the Solr schema, possibly an additional date field for sort (either defined in the transform step, or possibly, as a copy field directive in the Solr schema), and some configuration/design on the front end. I'd be happy to have a brief meeting about this to talk it through. The options available and trade-offs in complexity are enough to warrant some discussion.

ggeisler commented 2 years ago

To be clear, the design in #1470 will work either way (only sort by old to new, or both options). If we only want to offer old to new, that's just the only date sort option we offer to the user.

I don't think we can predict that no user will have a use case for wanting to see results sorted by new to old, even it is the less common case, so that's why I asked the question. It's a little odd to only offer sorting a field in one direction. But if there are reasons not to offer both, that's fine with me since we'd at least be covering the primary use case.

cbeer commented 2 years ago

Decision from planning: offer both Sort by Date (old to new) and Sort by Date (new to old), unless it becomes hairy to implement.

corylown commented 2 years ago

@jacobthill here's a summary of my current understanding of how we'd like date sort to behave. I'd like confirmation from you that this matches your expectations before proceeding with implementation. If this all seems right, I'll spin out some additional tickets because there are multiple moving parts and the order they are completed is important.

Acceptance Criteria

Tasks (will spin off and link to new issues to capture each)

jacobthill commented 2 years ago

@corylown yes that sounds right to me.