Blacklight facet view sorting options

mcritchlow commented 8 years ago

This issue was raised in the Review meeting from Sprint 21. Questions:

Can lexical sorting be used in a way similar to the description in this article?
Can we ignore case? Example of ALLCAPS values
Can we ignore punctuation?

hweng commented 7 years ago

@mcritchlow I've updated solr schema: https://github.com/ucsdlib/dams5-cc-pilot/blob/master/solr/config/schema.xml But after restart solr, I checked solr schema in development it is still the old file: http://localhost:8983/solr/#/hydra-development/files?file=schema.xml

Do you know if the above is the correct file path to change the schema.xml?

hweng commented 7 years ago

@mcritchlow Never mind. I figured out.

hweng commented 7 years ago

@mcritchlow I’ve tried adding the following filter to the analyzer in solr schema.xml :

For lexical sorting, <filter class="solr.PatternReplaceFilterFactory" pattern="(\d+)" replacement="00000$1" replace="all"/>

For ignore case, <filter class="solr.LowerCaseFilterFactory"/>

For ignore punctuation, <filter class="solr.PatternReplaceFilterFactory" pattern="([^A-Za-z0-9])" replacement="" replace="all"/> or <filter class="solr.WordDelimiterFilter" generateWordParts="1" catenateWords="1" splitOnCaseChange="0" generateNumberParts="0" splitOnCaseChange="0"/>

And each time i restarted solr and checked the solr admin to make sure the schema.xml got updated, then reindexed the data. But it doesn’t seem working that the Blacklight facet sort result still display default lexical sorting, case insensitive and with punctuation.

Any suggestion?

mcritchlow commented 7 years ago

Have you been able to test directly in the Solr instance to confirm that it is Blacklight that is somehow overriding the configuration changes you've made to schema.xml?

If we know for sure that it's blacklight, maybe a question in the hydra dev slack channel might get a quick answer? I'm not sure how active the blacklist irc channel is at this point. @lsitu or @VivianChu, any ideas?

lsitu commented 7 years ago

@mcritchlow @hweng I think it's a good idea to test it directly in Solr to see whether it works first.

hweng commented 7 years ago

@mcritchlow It doesn't seem that blacklight override the facet sorting, but I will check again.

hweng commented 7 years ago

@mcritchlow I've double checked blacklight facets module that it doesn't override facet sorting, which it use default facet index order as I compared it to the result of query executed from solr admin. But I have some ideas about it, and am trying the new approach. Thanks!

hweng commented 7 years ago

@mcritchlow I got the updates to solr schema working for case insensitive, removing punctuation and forcing numbers to sort numerically. Here is the result:

screen shot 2016-11-15 at 11 59 01 am

A question for overriding solr default lexically sorting, how may zeros do we want to left-pad a number?

hweng commented 7 years ago

@arwenhutt Any suggestion for the above question?

mcritchlow commented 7 years ago

@hweng a quick question on the case sensitivity change, and potentially the numerical padding/sorting. are the displayed values to the end user changing, or just the schema/sorting configuration?

For example, if an original facet value was "OCEAN" is it now going to be displayed to an end user as "ocean"? Or will it still be shown as "OCEAN" but sorted as it if were "ocean"? I believe the latter is what is going to be desirable, as I think I recall @arwenhutt noting that the capitalized values are that way for a reason.

hweng commented 7 years ago

@mcritchlow Yes, the facet value that solr is sorting on would display "ocean". Or will it still be shown as "OCEAN" but sorted as it if were "ocean"? No, Solr just won't do that.

But since I've applied the filters only to facet values not records, so if the user click the facet link to the record it still preserve the original case in those fields.

mcritchlow commented 7 years ago

@hweng Thanks for clarifying 👍 I think that distinction is very important information for everyone to know when considering whether this solution will work for us.

@arwenhutt @gamontoya - will that be acceptable?

hweng commented 7 years ago

@mcritchlow @arwenhutt @gamontoya From my research for the solr sorting options, the workaround solution I applied is to keep the records original fields and only apply filters to the facets that solr is sorting on. Here you could see that the records still preserve the original fields of "2", "10", "Mom's", while the facets got removed punctuation and added padding for sorting purpose:

gamontoya commented 7 years ago

@mcritchlow I'm not sure I captured this correctly. Are you saying that a topic in all caps, like OCEAN would sort at "ocean" and would also display as ocean and not OCEAN?

@hweng On your topic sort example above, are you purposely asking numerical values to appear first?

mcritchlow commented 7 years ago

@gamontoya - I think @hweng's example above does a good job illustrating the solution that she has come up with. Basically, the facet sort isn't ignoring case. It's create lower cased facet values and leaving the show values the same. moms vs Mom's

I wanted to clarify this for everyone, since I'm not sure this is a desirable outcome.

hweng commented 7 years ago

@gamontoya Yes, in solr sorting the numerical values appears before any alphabet letters a - z.

gamontoya commented 7 years ago

@hweng Now that part, I'm not sure I like. I prefer the numerical values after A-Z. @mcritchlow @arwenhutt Thoughts?

hweng commented 7 years ago

@arwenhutt @gamontoya From DAMS4 data, it is mostly years. Please see the following screening shot. You may not see it from browsing page, but you can view it by direct type in the url: http://library.ucsd.edu/dc/search/facet/subject_topic_sim?facet.sort=index

screen shot 2016-11-17 at 9 51 31 am

hweng commented 7 years ago

@arwenhutt @gamontoya If the facet sorting updates looks good to you, I will create the pull request for it. If you have any other thought, would you please comment it here? Thank you!

gamontoya commented 7 years ago

@hweng Can you sort alphabetic followed by numeric?

hweng commented 7 years ago

@gamontoya I thought Matt had already explained to you that the solr sorting don't have options for that and you cannot sort alphabetic followed by numeric. It have two options sort by index and sort by count. The sort by index is to sort by alphabetic that starts with numeric. If you want to do very customized facet sorting which do not use solr sorting and blacklight modules, then it would be another project.

gamontoya commented 7 years ago

@hweng No new project here. Go ahead and make the pull request and we'll see how things look/behave.

hweng commented 7 years ago

@gamontoya Thanks! A pull request has been submitted to https://github.com/ucsdlib/horton/pull/6

hweng commented 7 years ago

The group decided not to implement the above solr filters to facet ordering now. Will revisit in the future.

dolsysmith commented 3 years ago

Discussing this with Schol Comm today: is it possible to make the filter case-insensitive, and to apply case folding to the facets at the time the keyword list is generated? I'm not sure if this is what you referring to above as rules for automatically merging keywords with different cases. It seems like the cases in which this would produce undesirable results (e.g., the acronymOCEAN gets collapsed with the keyword ocean) would occur less frequently than the cases where the records are not being correctly aggregated due to unintentional differences in capitalization.

ucsdlib / dams5-cc-pilot

Blacklight facet view sorting options #32