scientist-softserv / adventist_knapsack

Apache License 2.0
2 stars 0 forks source link

Spike: CSV Errors #334

Closed jillpe closed 6 months ago

jillpe commented 1 year ago

Summary

https://assaydepot.slack.com/archives/C0313NJV9PE/p1692212781371739

TLDR; CSV import worked on staging, and errored out the first time on production (Ldp::NotFound). The second time it ran on production, it ran very slow and ended with a series of errors

Importer on staging that worked Importer on production that failed

Record_SPD_2023_07_15_COPY2.zip

Accepted Criteria

kirkkwang commented 11 months ago

@KatharineV

There's a bug in Fedora that occurs when it's being overloaded with too many PUTS or POSTS that will cause these LDP errors that make the importers fail. Generally, the solution is to initiate a reimport and if that's still not working then we'd try a Fedora restart.

KatharineV commented 11 months ago

@kirkkwang Maybe we need a Fedora restart? I just tried two importers in staging/prod, and I got LDP not found in prod.

Staging was successful: https://sdapi.s2.adventistdigitallibrary.org/importers/69?locale=en

Production failed: https://sdapi.b2.adventistdigitallibrary.org/importers/71?locale=en

I used the same zipped files and CSV for both importers.

I would ask if prod is failing due to the import after Spacestone, but I haven't heard that it has started (?), and I know the LDP issue was present over a month ago when we opened this ticket. What do you think?

kirkkwang commented 11 months ago

Ah good to know, we'll restart Fedora and let you know to try again and see if that'll help.

KatharineV commented 11 months ago

Today I tried to update two of the importers on SDAPI production. Since Fedora was restarted last week, I thought I might be able to update the importers, push corrected metadata to the few works that already loaded, and complete the import. Unfortunately, most of the works still failed with error code Ldp::NotFound. Half the works in one import are stuck pending. One work is marked complete, but the associated file isn't showing up yet--at least, it's taking much longer than usual to show it is attached. Weirdest of all, one work is marked "failed" on the importer screen, but it shows up (with metadata only).

First importer (all but one failed or pending): https://sdapi.b2.adventistdigitallibrary.org/importers/36?locale=en "Complete" work with missing file: https://sdapi.b2.adventistdigitallibrary.org/concern/journal_articles/record_spd_2023_07_15_8b_fruitful_visit?locale=en "Failed" work that did import, minus its file: https://sdapi.b2.adventistdigitallibrary.org/concern/journal_articles/record_spd_2023_07_15_8c_kingly_honour?locale=en

Second importer (all failed): https://sdapi.b2.adventistdigitallibrary.org/importers/71?locale=en

I could use some assistance in our October support hours to get CSV imports working on production and to clean up these imports on SDAPI. Since it's production, I'm not thrilled to be testing and running and rerunning in this live, public environment. I'm a little unsure about how to best clean up these imports. Is it ok for me to keep editing and rerunning the same importers? Will that work? Do I need to create fresh importers instead? Anyway, some advice and oversight from the team will be very welcome. Thanks.

orangewolf commented 11 months ago

I've fixed a few issues with the database that seem to have resolved this issue. the big was was that an index was missing on bulkrax_statuses (a very large table) causing lots of very long queries that were mucking things up. I re-ran the two importers mentioned and both ran well.

KatharineV commented 11 months ago

Thank you for the work you did so far! I just ran an importer on prod (since the issue was never on staging), and it ran beautifully for a minute, got stuck, and errored out with a bunch of Ldp::HttpError and Ldp::NotFound errors. Ugh! Any idea what's going on?

Here's the importer: https://sdapi.b2.adventistdigitallibrary.org/importers/104?locale=en

By the way, if we're on the verge of kicking off the big import post-SpaceStone, do not feel pressed to work on this ticket. I have a backlog of things I'd like to upload to the SDAPI tenant, but I know production will soon be overwhelmingly busy.