Open adolski opened 1 year ago
@adolski I don't know if this is relevant to this issue, but I was looking through the books database and noticed that there are no books listed as existing in hathitrust. Is that accurate?
In the Rails console I ran the following and get an empty array.
Book.where(exists_in_hathitrust: true)
I verified that books in google and Internet Archive DO exist by running similar commands in the console.
I assume you were using the Rails console on your local computer? If so, that's what would be expected if you haven't run a HathiTrust check yet from your local computer.
That must be it. I had started running one a few weeks ago but cancelled it before it fully ran. I'll re-run and see if that fixes it.
I ran the hathitrust check locally (as well as IA for good measure). Both checks ran successfully but hathi shows 0 records were updated, and in my Rails console still shows up as empty. Is it any different when you run that check locally?
irb(main):002> Book.where(exists_in_hathitrust: true)
Book Load (0.9ms) SELECT "books".* FROM "books" WHERE "books"."exists_in_hathitrust" = $1 /* loading for pp */ LIMIT $2 [["exists_in_hathitrust", true], ["LIMIT", 11]]
=> []
Something isn't right there. There should be several hundred thousand found items rather than 0. Could you investigate?
Alright I figured out the problem. The library_nuc_code
in my development.yml
was labeled UIUC instead of UIU, so this block wasn't running because if parts[5] == nuc_code
was always going to be false.
I updated the configuration and re-ran the check, and it works as expected now with approx. 750k found items.
File.open(pathname).each_with_index do |line, index|
parts = line.split("\t")
# require 'pry';binding.pry
if parts[5] == nuc_code
book = Book.find_by_obj_id(parts[0].split('.').last)
# code
end
# code
irb(main):002> Book.where(exists_in_hathitrust: true).count
Book Count (37.3ms) SELECT COUNT(*) FROM "books" WHERE "books"."exists_in_hathitrust" = $1 [["exists_in_hathitrust", true]]
=> 750229
I'm trying to understand what this overall issue is hoping to accomplish -- do we want to compare the contents we have submitted to Hathi are actually in the Hathi database/accessible
(aka seeing if it exists in the latest file download)? If so, is there a list or source of the contents we have submitted to Hathi?
I think if we have an existing source of that content, we could programmatically compare it with the content that's in the latest hathi download zipfile which would verify that it's ingested correctly.
I'm unsure about the piece with Google and IA in this context - any additional info you might have would be helpful. Thank you!
Honestly, I don't really know what this issue is asking for, either. I just copied it out of our old issue tracker before it got shut down.
I just talked to MJ about it. She says it's not important enough to work on right now. 🥳
Sounds good! Is that the same for issues #14 and #15 or should I work on those?
I think those are low priority as well. I think we are going to shift over to the Digital Library next. I'm working on preparing some onboarding documentation and I'll keep you posted.
Submitted by MJ to JIRA on 4/30/2015: