psu-libraries / psulib_blacklight

Penn State University Libraries' Blacklight Catalog
Apache License 2.0
10 stars 3 forks source link

Change HathiTrust links to API Call #1291

Closed ruthtillman closed 5 months ago

ruthtillman commented 6 months ago

Background

The earliest Catalog used the HathiTrust API. It's been 5 years so we can't just roll it back, but it might be worth looking at.

During lockdown, HathiTrust provided us with a CSV datafile matching our catalog holdings and their scans. We had indexed this along with our MARC data to provide keys which saved us the trouble of API lookups. At the time, we were using ETAS (Emergency Temporary Access Service) to access and digitally check out scans of books that are in copyright but were locked in our closed library. When they stopped providing ETAS checkouts, we switched to only show items which are in the public domain.

HathiTrust stopped sending us matching CSVs in July 2021. Since we were mostly concerned with pre-copyright books, this hasn't been a thing that updates much, but more books have come into public domain since then, more have been identified as out of copyright by teams working for HT, and some records have changed over time.

Current Behavior

We currently use data from our index to determine whether or not we can link to a HathiTrust scan of the book.

New Behavior

We'll want to send an API request similar to what we do with Google Books. We only want to display the links where we get full text copies. HathiTrust's search-inside-only functionality is far inferior to the Google Books search inside and we don't ever want to include it.

For the sake of being sure that updating this doesn't mean that we accidentally mess up interacting logic with Google Books, display logic should be:

HathiTrust API

The API uses only one identifier parameter. Choose the first one which exists in the following order:

Syntax is: https://catalog.hathitrust.org/api/volumes/brief/oclc/424023.json with appropriate param in place of oclc if needed

We are looking for at least ONE item which has: "usRightsString":"Full view"

Sample with several items in Limited View only: https://catalog.hathitrust.org/api/volumes/brief/oclc/424023.json

Sample with only one item, Full View: https://catalog.hathitrust.org/api/volumes/brief/oclc/433.json

These two pages have more documentation:

Not Changing

We are not changing our indexing/indexed data at this point.

ajkiessl commented 6 months ago

@ruthtillman The hathi links show on the search page. Do we still want this to happen? Also, if there's a hathi link, we don't show the availability on the search page. If we do want to show the hathi links, do we want the availability to show if there's a hathi link?

ruthtillman commented 6 months ago

So it looks like we're not currently showing them in search results, though we may have earlier on?

e.g. https://catalog.libraries.psu.edu/?all_fields=&author=&f%5Baccess_facet%5D%5B%5D=In+the+Library&identifiers=&op=AND&publisher=&range%5Bpub_date_itsi%5D%5Bbegin%5D=1890&range%5Bpub_date_itsi%5D%5Bend%5D=1920&search_field=advanced&series=&sort=score+desc%2C+pub_date_itsi+desc%2C+title_sort+asc&subject=&title=

results 2 & 3 at least have HT links.

Let's maintain it as is, not showing in search results view but only on the page itself.

ruthtillman commented 5 months ago

One thing I was thinking of while revisiting documentation -- we could choose not to make these calls on a subset of our formats: Audio, Equipment, Games/Toys, Image, Kit, and Video almost certainly shouldn't have it.

But I don't know if it's expensive enough for putting in those exceptions to be worth it -- in those cases neither HathiTrust nor Google Books should return data, so it would be silently querying and doing nothing and maybe that doesn't matter.

ruthtillman commented 5 months ago

And I had a second thought while documenting the Google Books API call -- I think it's the same query we're using to generate thumbnails. Is it possible to avoid making a second call and check the thumbnail query for data but only display it if the HT link fails? Or is that too messy?

ruthtillman commented 5 months ago

Reviewed on its own branch but just checked again in QA. Looks good.