ukwa / shine

Prototype SOLR-powered web archive exploration UI.
https://github.com/ukwa/shine/wiki
Apache License 2.0
43 stars 7 forks source link

Faceted Trend Searching #72

Closed ianmilligan1 closed 9 years ago

ianmilligan1 commented 9 years ago

We have been tinkering around with our own Shine instance (available here) and a commonly requested user feature has been faceted trend searching. This is an artefact of having a smaller collection vs. a national websphere.

For example, could trend searching allow a user to search austerity within two top-level domains, i.e. https://www.conservatives.com/ and http://www.labour.org.uk/, and compare their relative frequency? We could then have a comparison of austerity (conservatives.com) and austerity (labour.org.uk).

To see a possible example, here is the HathiTrust Bookworm trend:

Example screenshot

I think this might expand the number of potential use cases.

tokee commented 9 years ago

Underneath the hood, this would be a standard trend search for austerity domain:conservatives.com and austerity domain:labour.org.uk, right?

anjackson commented 9 years ago

Yes, this is already possible if you know the Secret Squirrel Syntax:

http://www.webarchive.org.uk/shine/graph?query=austerity+domain%3Aconservatives.com%2C+austerity+domain%3Alabour.org.uk&year_start=1996&year_end=2010&action=update

So it's a question of building a decent UI for it. My original idea was to use the facet-search interface to build queries, save them, and them combine them in the trends interface. However, the Hathi Trust mechanism looks much better.

Sadly, we're not going to be able to invest much development time in Shine right now.

EDIT: Took me far too long to realise that the reason there is not enough conservatives.com content in our index is because it's not a .uk! That's a shame. They are in our by-permission archive, but not currently in the historical one.

ianmilligan1 commented 9 years ago

Thanks, @anjackson! The easiest kind of issue: one where the functionality already exists...!

Just playing around with this and can use any of the other facets too, eh? I will document this on our Shine installation.

And hopefully you can get more conservatives.com content into your database - or convince them to switch over to .uk. ;)