medusa-project / book-tracker

Medusa Book Tracker
0 stars 0 forks source link

Troubleshoot Google Books indexing failure #55

Open KyleRimkusLibrarian opened 3 weeks ago

KyleRimkusLibrarian commented 3 weeks ago

The Book Tracker has not been reflecting accurate numbers for Google Books for a few years. Let's dig into some of the details.

  1. The Google XML files appear to be getting ingested correctly into Medusa. Newer files in Medusa lack a “last modified date” due to modifications in ingest processes in recent years but by cross-referencing file names it looks like they have been making it into Medusa as expected. For example, the file https://medusa.library.illinois.edu/cfs_files/61828540
  2. There is a somewhat manual step for indexing Google Grin that seems to be working. I ran through it myself and received a success notice with “updated database with 942146 found items”.
  3. So we know that Medusa is holding all of our latest Google XML data ingested from local storage, and indexing the lates Google Grin data. The problem is that the Book Tracker application does not appear to be correctly cross-referencing newly added items from 1 and 2. Specifically, it is not adding new items to the official list of Google items in Book Tracker. Why?

One theory, possibly a long shot, is that the newly added items that lack a last modified date for the file are the ones that aren’t getting logged. Is the system looking for file dates when running this process? Or is there something else going wrong?

srbbins commented 3 weeks ago

@KyleRimkusLibrarian Would you mind if I set a meeting to have you walk Gauri and me through the indexing process again?

KyleRimkusLibrarian commented 3 weeks ago

Not at all. Sounds good.

Sent from my iPhone

On Nov 4, 2024, at 4:45 PM, srbbins @.***> wrote:



@KyleRimkusLibrarianhttps://urldefense.com/v3/__https://github.com/KyleRimkusLibrarian__;!!DZ3fjg!65pKo5zxChcFIe4YmJEowx0xWz4j1yr39w5HDhm7yMZ4fvYPX4mTtHxOsshDzfP8sBwgqDUx2ekzvP4Lq1Ihgpvyow$ Would you mind if I set a meeting to have you walk Gauri and me through the indexing process again?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/medusa-project/book-tracker/issues/55*issuecomment-2455854211__;Iw!!DZ3fjg!65pKo5zxChcFIe4YmJEowx0xWz4j1yr39w5HDhm7yMZ4fvYPX4mTtHxOsshDzfP8sBwgqDUx2ekzvP4Lq1IbFKAPlg$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAFIDOS2CQROYTTMUMY6GTDZ67TITAVCNFSM6AAAAABQ66536OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJVHA2TIMRRGE__;!!DZ3fjg!65pKo5zxChcFIe4YmJEowx0xWz4j1yr39w5HDhm7yMZ4fvYPX4mTtHxOsshDzfP8sBwgqDUx2ekzvP4Lq1INl6lKQw$. You are receiving this because you were mentioned.Message ID: @.***>