scientist-softserv / adventist_knapsack

Apache License 2.0
2 stars 0 forks source link

CSV import with remote files stuck pending? #328

Closed KatharineV closed 11 months ago

KatharineV commented 1 year ago

Today I tested a CSV import via Bulkrax with minimal metadata and a link to a file on S3. I wanted to see how a CSV import works with remote files rather than a ZIP. The test I did on ADL Staging has been pending for over an hour. Did I do something wrong or is it stuck? Your insight will be gratefully received! Thanks.

https://adl.s2.adventistdigitallibrary.org/importers/30?locale=en

kirkkwang commented 11 months ago

Hi @KatharineV

Looks like we need a column for work_type with whatever work type, which is interesting because I thought it would have just defaulted to GenericWork 🤔 .

image

In any case, it looks like the way it's set up, it will just map the text of the remote_url column on the csv to the remote_url property.

ShanaLMoore commented 11 months ago

I'm glad it was just a data issue but it would be helpful for bulkrax to capture the issue though instead of silently staying pending. I'll file an issue against bulkrax.

KatharineV commented 11 months ago

Thanks for checking the failed importer from last month. I created a new one today using a CSV with all the required fields plus work_type, and the importer was successful. The Samvera Bulkrax documentation that I've been using as a reference doesn't list work_type as a required field, so is this a bug or is the field actually required and the documentation needs an update? If it's expected behavior, then this ticket is done. If it's a bug, then I don't want to close it yet. I, like Kirk, thought the importer would default to Generic work. Thanks for letting me know.

Edited to add Here's the importer that was successful: https://adl.s2.adventistdigitallibrary.org/importers/33?locale=en And the work with remote files that appear under "Items" now: https://adl.s2.adventistdigitallibrary.org/concern/journal_articles/20089403a_this_is_a_test_of_urls_placed_in_a_csv_upload?locale=en

kirkkwang commented 11 months ago

Thanks Katharine, i'll check with the team about if that is expected behavior, either way it should throw an error instead of having it stuck in pending. But I'm at least glad we have a workaround for now.

KatharineV commented 11 months ago

Kirk, can you or the team tell me if this kind of remote file URL should work to import a PDF to hyku?

https://www.andrews.edu/library/car/cardigital/Periodicals/Adventist_Journey/2021/2021_07_08.pdf

It's a link to an open file share rather than S3.

kirkkwang commented 11 months ago

Hi @KatharineV, are you asking if that pdf would get attached to the record? Are you using the header related_url or remote_url/official_url? The related_url header should be the one where you import the file. That pdf link looks good to me and should work.

Also, I did ask the team about the expected behavior for the Bulkrax issue and it seems that we should have a designated work_type. I haven't tried what would happen if I removed the work_type header though. I think the problem is that if you have a header and no value, then it throws it off.

KatharineV commented 11 months ago

Awesome, thank you. I'll make sure we always include work_type. Also, I was definitely confused about the related vs. remote URL. I just tried that link with the proper heading (thanks!) and it imported the work on staging with no issues. So, I think it's ok to close this ticket, and I will.