Open jwang40 opened 8 years ago
Can you say what you mean by 98%? Do you mean 98% of all HathiTrust requests are failing? Or just that of the ones that are failing, 98% have that message? Or that of all the failed responses you get, 98% of them are HathiTrust?
Perhaps HathiTrust changed their API in some way, intentionally or intentionally. @billdueber any thoughts?
I am no longer employed in a position where I work on Umlaut, so have little time to spend on it. (Oh, hi Jing!) Not sure if @kevinreiss has much time?
But we'll definitely review and merge pull requests if you want to submit one!
Hi, Jonathan, Both. 98% of all HathiTrust requests are failing. and 98% of those are failing among all services are from Hathitrust with runtime error. What is confusing is that there are 2% successful ones. For example: https://catalog.hathitrust.org/api/volumes/brief/json/lccn:75647497 does not return bibliographic data for "British Library Journal". However, https://catalog.hathitrust.org/api/volumes/brief/json/issn:03055167 does return bibliographic data. I wonder whether this has anything to do with xID service. Do umlaut use xID service?
None of that seems right. I'll look into it.
On Fri, May 20, 2016 at 2:27 PM, Jonathan Rochkind <notifications@github.com
wrote:
Can you say what you mean by 98%? Do you mean 98% of all HathiTrust requests are failing? Or just that of the ones that are failing, 98% have that message?
Perhaps HathiTrust changed their API in some way, intentionally or intentionally. @billdueber https://github.com/billdueber any thoughts?
I am no longer employed in a position where I work on Umlaut, so have little time to spend on it. (Oh, hi Jing!) Not sure if @kevinreiss https://github.com/kevinreiss has much time?
But we'll definitely review and merge pull requests if you want to submit one!
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/team-umlaut/umlaut/issues/56#issuecomment-220682978
Bill Dueber Library Systems Programmer University of Michigan Library
If you have more examples, could you send a solid handful to me? Is it a particular type of identifier?
On Fri, May 20, 2016 at 2:38 PM, jwang notifications@github.com wrote:
Hi, Jonathan, Both. 98% of all HathiTrust requests are failing. and 98% of those are failing among all services are from Hathitrust with runtime error. What is confusing is that there are 2% successful ones. For example: https://catalog.hathitrust.org/api/volumes/brief/json/lccn:75647497 does not return bibliographic data for "British Library Journal". However, https://catalog.hathitrust.org/api/volumes/brief/json/issn:03055167 does return bibliographic data. I wonder whether this has anything to do with xID service. Do umlaut use xID service?
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/team-umlaut/umlaut/issues/56#issuecomment-220685797
Bill Dueber Library Systems Programmer University of Michigan Library
As far as I can remember, Umlaut does not use the xID service.
Oddly, If I click on the URL yu pasted in the error message, I don't get any error. @billdueber , is it possible it's rate limiting us or something?
Umlaut may indeed have a bug, although obviously one that wasn't triggered until recently. Or HathiTrust may, that is only triggered by a few things like Umlaut.
The HathiTrust plugin code is here: https://github.com/team-umlaut/umlaut/blob/master/app/service_adaptors/hathi_trust.rb
No xid involved.
The line in the stack trace you posted is here. @billdueber , it looks like it's making a /brief/json
request, with certain search params.
Probably this one, from the error @jwang40 pasted: http://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440;lccn:2007042973
But like I said, i don't get an error myself (or a redirect, I don't think?) on that URL, so that's odd.
There are at least two things going on here, I think. The first is that HT recently finished moving everything to https -- http urls automatically redirect. My guess is that's what the "redirection forbidden" is and accounts for most of your errors.
But that LCCN link you posted should totally find the right record -- it's right there in the 010:
https://catalog.hathitrust.org/Record/000544346.marc
I'll try to track it down.
Aha, wait.
@billdueber , does the ;
need to be URI-escaped, when it didn't previously (or even couldn't be previously)?
$ curl 'http://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440;lccn:2007042973'
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440%3blccn:2007042973">here</a>.</p>
<hr>
<address>Apache/2.4.10 (Debian) Server at catalog.hathitrust.org Port 80</address>
</body></html>
Looks like it's redirecting to an escaped version, but my client code won't follow the redirect.
If that's what it is, it's an easy fix. Just change this line to join('%3B')
.
@jwang40 , interested in submitting a pull request?
We perhaps also should change it to https instead of http, yes @billdueber?
:class_name: RuntimeError :message: 'redirection forbidden: http://catalog.hathitrust.org/api/volumes/brief/json/oclc:49817552 -> https://catalog.hathitrust.org/api/volumes/brief/json/oclc:49817552' :backtrace:
Ah, then tehre's the fact it's finding 0 hits when it should be finding one. That's not even the bug being reported here, but that's bad too. I wonder if multiple-field searching is broken?
:class_name: RuntimeError :message: 'redirection forbidden: http://catalog.hathitrust.org/api/volumes/brief/json/isbn:1852789093 -> https://catalog.hathitrust.org/api/volumes/brief/json/isbn:1852789093' :backtrace:
Clearly something on this end -- getting " Symbolic link not allowed or link target not accessible" in the error logs.
let me know if you need more examples
Hell, that was only on the dev server. So now I gotta get that fixed so I can see what's going on.
Working on it...
OK. So here's what I think is happening:
OK, can anyone provide any more examples where a search fails with zero hits but you're sure it should find something (e.g., not a redirect problem with the client, but an actual API problem on my server)?
OK, I've pushed out a workaround where I just space-expand every lccn:val
to include lccn:"_val"
, lccn:" __val"
, lccn:"___val"
, etc. out to five spaces. That should find everything until I get things reindexed.
Bill, Thanks.
I will find more examples after we fix the redirect problem, which will exclude lots of examples legitimately don't have records in the HT catalog.
The problem on Umlaut's side is mainly just that it's using http
when it should be using https
. If it starts with https
, then the request is served without a redirect, even with un-escaped ;
. (I believe by standards, you aren't actually supposed to escape a ;
in a query string, meant as a separator).
@jwang40 , could you try in your local app, in the umlaut_services.yml, set a key for the hathi trust adapter:
api_url: 'https://catalog.hathitrust.org/api/volumes'
If that works -- to get rid of the errors -- we can change the default in umlaut source and release a patch version. A Pull Request would be welcome, it's a very simple one-line (one-letter!) change, so if you've never done a Pull Request before, it would be a good way get familiarity with git and github PR's.
Cases where there is no error (which after this change there shouldn't ever be), but HT reports no results when it should report some, are a different problem, that can only be fixed on HT's side, and it sounds like @billdueber is on it, thanks bill!
@jrochkind we did try what you have suggested. However, the change didn't make any difference. We changed the configuration with caching in ./config/environments/demo.rb, but still no use: config.consider_all_requests_local = true config.action_controller.perform_caching = false
@jwang40 you made the change to umlaut_config.yml in production, but still got the exact same error message? Can you post an example of an error message you're getting after the change? Are you sure you restarted the app in production?
You should never set consdier_all_requests_local = true
or perform_caching = false
in production, those can both cause problems.
@jrochkind No. all the changes were made in umlaut_demo.
I'm sorry, no what? You did make the change, but still saw errors? If so, can you post an example of an error message you're getting after you made the change?
Are you sure you restarted the app after making the change to the config file? Normally it would be best to make the change on a dev machine, commit to git, and redeploy the app. If you are making changes to config files directly on the deployed machine instead, you will need to restart the app after making changes.
@jrochkind Sorry. I meant that all the config changes, including the caching parameters, were made in umlaut_demo, not in production. We will do more testing today.
Okay, as I posted, I believe the only change you should need to make is to config/umlaut_services.yml, , find the block for the hathi trust adapter, and set an api_url
value (was probably not set before), as:
api_url: 'https://catalog.hathitrust.org/api/volumes'
If you are editing the file directly on your deployment machine (not recommended), then you'll need to restart the app after the change.
Based on my current understanding, that is the only change you should need to get rid of the redirection_prohibited
errors.
If that does work, also like I said, if you wanted to send a pull request for changing the default in Umlaut, that would be welcome, and a very very simple thing to use to get familiar with git and pull requests.
Adding the api_url value to the config worked.
api_url: 'https://catalog.hathitrust.org/api/volumes'
I will submit a pull request
Sorry to come late to this discussion but I'm also confirming @jrochkind's suggested fix to update the api_url value to https solves the issue.
We noticed recently that 98% of failed_fatal service responses from Hathitrust with the exceptions like the one listed below:
:class_name: RuntimeError :message: 'redirection forbidden: http://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440;lccn:2007042973 -> https://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440%3blccn:2007042973' :backtrace: