medusa-project / book-tracker

Medusa Book Tracker
0 stars 0 forks source link

Add a new category for locally or vendor digitized books #13

Open adolski opened 1 year ago

adolski commented 1 year ago

Submitted by MJ to JIRA on 4/30/2015:

We submit locally digitized contents to Hathi, And I would like to see whether submitted contents are ingested correctly and available in Hathi as Google and IA digitized books.

gaurijo commented 9 months ago

@adolski I don't know if this is relevant to this issue, but I was looking through the books database and noticed that there are no books listed as existing in hathitrust. Is that accurate?

In the Rails console I ran the following and get an empty array.

Book.where(exists_in_hathitrust: true) 

I verified that books in google and Internet Archive DO exist by running similar commands in the console.

adolski commented 9 months ago

I assume you were using the Rails console on your local computer? If so, that's what would be expected if you haven't run a HathiTrust check yet from your local computer.

gaurijo commented 9 months ago

That must be it. I had started running one a few weeks ago but cancelled it before it fully ran. I'll re-run and see if that fixes it.

gaurijo commented 9 months ago

I ran the hathitrust check locally (as well as IA for good measure). Both checks ran successfully but hathi shows 0 records were updated, and in my Rails console still shows up as empty. Is it any different when you run that check locally?

Image

irb(main):002> Book.where(exists_in_hathitrust: true)
  Book Load (0.9ms)  SELECT "books".* FROM "books" WHERE "books"."exists_in_hathitrust" = $1 /* loading for pp */ LIMIT $2  [["exists_in_hathitrust", true], ["LIMIT", 11]]
=> []
adolski commented 9 months ago

Something isn't right there. There should be several hundred thousand found items rather than 0. Could you investigate?

gaurijo commented 9 months ago

Alright I figured out the problem. The library_nuc_code in my development.yml was labeled UIUC instead of UIU, so this block wasn't running because if parts[5] == nuc_code was always going to be false.

I updated the configuration and re-ran the check, and it works as expected now with approx. 750k found items.

File.open(pathname).each_with_index do |line, index|
        parts = line.split("\t")
        # require 'pry';binding.pry
        if parts[5] == nuc_code
          book = Book.find_by_obj_id(parts[0].split('.').last)
         # code 
          end
# code 
irb(main):002> Book.where(exists_in_hathitrust: true).count
  Book Count (37.3ms)  SELECT COUNT(*) FROM "books" WHERE "books"."exists_in_hathitrust" = $1  [["exists_in_hathitrust", true]]
=> 750229
gaurijo commented 9 months ago

I'm trying to understand what this overall issue is hoping to accomplish -- do we want to compare the contents we have submitted to Hathi are actually in the Hathi database/accessible (aka seeing if it exists in the latest file download)? If so, is there a list or source of the contents we have submitted to Hathi?

I think if we have an existing source of that content, we could programmatically compare it with the content that's in the latest hathi download zipfile which would verify that it's ingested correctly.

I'm unsure about the piece with Google and IA in this context - any additional info you might have would be helpful. Thank you!

adolski commented 9 months ago

Honestly, I don't really know what this issue is asking for, either. I just copied it out of our old issue tracker before it got shut down.

I just talked to MJ about it. She says it's not important enough to work on right now. 🥳

gaurijo commented 9 months ago

Sounds good! Is that the same for issues #14 and #15 or should I work on those?

adolski commented 9 months ago

I think those are low priority as well. I think we are going to shift over to the Digital Library next. I'm working on preparing some onboarding documentation and I'll keep you posted.