team-umlaut / umlaut

Umlaut, a specific item service provider for libraries
Other
77 stars 24 forks source link

hathitrust error #56

Open jwang40 opened 8 years ago

jwang40 commented 8 years ago

We noticed recently that 98% of failed_fatal service responses from Hathitrust with the exceptions like the one listed below:

:class_name: RuntimeError :message: 'redirection forbidden: http://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440;lccn:2007042973 -> https://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440%3blccn:2007042973' :backtrace:

jrochkind commented 8 years ago

Can you say what you mean by 98%? Do you mean 98% of all HathiTrust requests are failing? Or just that of the ones that are failing, 98% have that message? Or that of all the failed responses you get, 98% of them are HathiTrust?

Perhaps HathiTrust changed their API in some way, intentionally or intentionally. @billdueber any thoughts?

I am no longer employed in a position where I work on Umlaut, so have little time to spend on it. (Oh, hi Jing!) Not sure if @kevinreiss has much time?

But we'll definitely review and merge pull requests if you want to submit one!

jwang40 commented 8 years ago

Hi, Jonathan, Both. 98% of all HathiTrust requests are failing. and 98% of those are failing among all services are from Hathitrust with runtime error. What is confusing is that there are 2% successful ones. For example: https://catalog.hathitrust.org/api/volumes/brief/json/lccn:75647497 does not return bibliographic data for "British Library Journal". However, https://catalog.hathitrust.org/api/volumes/brief/json/issn:03055167 does return bibliographic data. I wonder whether this has anything to do with xID service. Do umlaut use xID service?

billdueber commented 8 years ago

None of that seems right. I'll look into it.

On Fri, May 20, 2016 at 2:27 PM, Jonathan Rochkind <notifications@github.com

wrote:

Can you say what you mean by 98%? Do you mean 98% of all HathiTrust requests are failing? Or just that of the ones that are failing, 98% have that message?

Perhaps HathiTrust changed their API in some way, intentionally or intentionally. @billdueber https://github.com/billdueber any thoughts?

I am no longer employed in a position where I work on Umlaut, so have little time to spend on it. (Oh, hi Jing!) Not sure if @kevinreiss https://github.com/kevinreiss has much time?

But we'll definitely review and merge pull requests if you want to submit one!

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/team-umlaut/umlaut/issues/56#issuecomment-220682978

Bill Dueber Library Systems Programmer University of Michigan Library

billdueber commented 8 years ago

If you have more examples, could you send a solid handful to me? Is it a particular type of identifier?

On Fri, May 20, 2016 at 2:38 PM, jwang notifications@github.com wrote:

Hi, Jonathan, Both. 98% of all HathiTrust requests are failing. and 98% of those are failing among all services are from Hathitrust with runtime error. What is confusing is that there are 2% successful ones. For example: https://catalog.hathitrust.org/api/volumes/brief/json/lccn:75647497 does not return bibliographic data for "British Library Journal". However, https://catalog.hathitrust.org/api/volumes/brief/json/issn:03055167 does return bibliographic data. I wonder whether this has anything to do with xID service. Do umlaut use xID service?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/team-umlaut/umlaut/issues/56#issuecomment-220685797

Bill Dueber Library Systems Programmer University of Michigan Library

jrochkind commented 8 years ago

As far as I can remember, Umlaut does not use the xID service.

Oddly, If I click on the URL yu pasted in the error message, I don't get any error. @billdueber , is it possible it's rate limiting us or something?

Umlaut may indeed have a bug, although obviously one that wasn't triggered until recently. Or HathiTrust may, that is only triggered by a few things like Umlaut.

The HathiTrust plugin code is here: https://github.com/team-umlaut/umlaut/blob/master/app/service_adaptors/hathi_trust.rb

No xid involved.

The line in the stack trace you posted is here. @billdueber , it looks like it's making a /brief/json request, with certain search params.

Probably this one, from the error @jwang40 pasted: http://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440;lccn:2007042973

But like I said, i don't get an error myself (or a redirect, I don't think?) on that URL, so that's odd.

billdueber commented 8 years ago

There are at least two things going on here, I think. The first is that HT recently finished moving everything to https -- http urls automatically redirect. My guess is that's what the "redirection forbidden" is and accounts for most of your errors.

But that LCCN link you posted should totally find the right record -- it's right there in the 010:

https://catalog.hathitrust.org/Record/000544346.marc

I'll try to track it down.

jrochkind commented 8 years ago

Aha, wait.

@billdueber , does the ; need to be URI-escaped, when it didn't previously (or even couldn't be previously)?

$ curl 'http://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440;lccn:2007042973'
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440%3blccn:2007042973">here</a>.</p>
<hr>
<address>Apache/2.4.10 (Debian) Server at catalog.hathitrust.org Port 80</address>
</body></html>

Looks like it's redirecting to an escaped version, but my client code won't follow the redirect.

If that's what it is, it's an easy fix. Just change this line to join('%3B').

@jwang40 , interested in submitting a pull request?

We perhaps also should change it to https instead of http, yes @billdueber?

jwang40 commented 8 years ago

Here are more examples, some with oclc#:

:class_name: RuntimeError :message: 'redirection forbidden: http://catalog.hathitrust.org/api/volumes/brief/json/oclc:49817552 -> https://catalog.hathitrust.org/api/volumes/brief/json/oclc:49817552' :backtrace:

jrochkind commented 8 years ago

Ah, then tehre's the fact it's finding 0 hits when it should be finding one. That's not even the bug being reported here, but that's bad too. I wonder if multiple-field searching is broken?

jwang40 commented 8 years ago

another example with isbn

:class_name: RuntimeError :message: 'redirection forbidden: http://catalog.hathitrust.org/api/volumes/brief/json/isbn:1852789093 -> https://catalog.hathitrust.org/api/volumes/brief/json/isbn:1852789093' :backtrace:

billdueber commented 8 years ago

Clearly something on this end -- getting " Symbolic link not allowed or link target not accessible" in the error logs.

jwang40 commented 8 years ago

let me know if you need more examples

billdueber commented 8 years ago

Hell, that was only on the dev server. So now I gotta get that fixed so I can see what's going on.

Working on it...

billdueber commented 8 years ago

OK. So here's what I think is happening:

billdueber commented 8 years ago

OK, can anyone provide any more examples where a search fails with zero hits but you're sure it should find something (e.g., not a redirect problem with the client, but an actual API problem on my server)?

billdueber commented 8 years ago

OK, I've pushed out a workaround where I just space-expand every lccn:val to include lccn:"_val", lccn:" __val", lccn:"___val", etc. out to five spaces. That should find everything until I get things reindexed.

jwang40 commented 8 years ago

Bill, Thanks.

jwang40 commented 8 years ago

I will find more examples after we fix the redirect problem, which will exclude lots of examples legitimately don't have records in the HT catalog.

jrochkind commented 8 years ago

The problem on Umlaut's side is mainly just that it's using http when it should be using https. If it starts with https, then the request is served without a redirect, even with un-escaped ;. (I believe by standards, you aren't actually supposed to escape a ; in a query string, meant as a separator).

@jwang40 , could you try in your local app, in the umlaut_services.yml, set a key for the hathi trust adapter:

api_url: 'https://catalog.hathitrust.org/api/volumes'

If that works -- to get rid of the errors -- we can change the default in umlaut source and release a patch version. A Pull Request would be welcome, it's a very simple one-line (one-letter!) change, so if you've never done a Pull Request before, it would be a good way get familiarity with git and github PR's.

Cases where there is no error (which after this change there shouldn't ever be), but HT reports no results when it should report some, are a different problem, that can only be fixed on HT's side, and it sounds like @billdueber is on it, thanks bill!

jwang40 commented 8 years ago

@jrochkind we did try what you have suggested. However, the change didn't make any difference. We changed the configuration with caching in ./config/environments/demo.rb, but still no use: config.consider_all_requests_local = true config.action_controller.perform_caching = false

jrochkind commented 8 years ago

@jwang40 you made the change to umlaut_config.yml in production, but still got the exact same error message? Can you post an example of an error message you're getting after the change? Are you sure you restarted the app in production?

You should never set consdier_all_requests_local = true or perform_caching = false in production, those can both cause problems.

jwang40 commented 8 years ago

@jrochkind No. all the changes were made in umlaut_demo.

jrochkind commented 8 years ago

I'm sorry, no what? You did make the change, but still saw errors? If so, can you post an example of an error message you're getting after you made the change?

Are you sure you restarted the app after making the change to the config file? Normally it would be best to make the change on a dev machine, commit to git, and redeploy the app. If you are making changes to config files directly on the deployed machine instead, you will need to restart the app after making changes.

jwang40 commented 8 years ago

@jrochkind Sorry. I meant that all the config changes, including the caching parameters, were made in umlaut_demo, not in production. We will do more testing today.

jrochkind commented 8 years ago

Okay, as I posted, I believe the only change you should need to make is to config/umlaut_services.yml, , find the block for the hathi trust adapter, and set an api_url value (was probably not set before), as:

 api_url: 'https://catalog.hathitrust.org/api/volumes'

If you are editing the file directly on your deployment machine (not recommended), then you'll need to restart the app after the change.

Based on my current understanding, that is the only change you should need to get rid of the redirection_prohibited errors.

jrochkind commented 8 years ago

If that does work, also like I said, if you wanted to send a pull request for changing the default in Umlaut, that would be welcome, and a very very simple thing to use to get familiar with git and pull requests.

farooqsadiq commented 8 years ago

Adding the api_url value to the config worked. api_url: 'https://catalog.hathitrust.org/api/volumes' I will submit a pull request

kevinreiss commented 8 years ago

Sorry to come late to this discussion but I'm also confirming @jrochkind's suggested fix to update the api_url value to https solves the issue.