Closed ShanaLMoore closed 1 year ago
met w Drew:
If Bulkrax is enabled, a user gets redirected to the bulkrax importers page when hitting /batches endpoint.
I temporarily disabled this overridden controller locally to test out the batch ingest w the upgrades ...
batches is available in the side but but it is hidden behind bulkrax env.
search bar => facets > ability to export
if it's small enough it'll download immediately. if not, you'll get an email w a link to an s3 bucket.
Need to change the format.
/batches => upload choosing corresponds to csv used to upload assets
Bulkrax is only used for PB core ingest from ams.
If there are major changes on how the actor stack changed then Drew expecte we'll have problems.
batch level ingest config:
reader step parsing and maps. you'll have a batch record in the database and a bunch of batch items attached to it. they correspond to sidekiq jobs from the batch item table.
I am not seeing code level evidence that exports based on search is supported.
blocked by UI issue.
I'm unable to go through the workflow Drew demonstrated today because we I perform an empty search the export buttons do not render. This will likely be blocked until the bootstrap 4 upgrade is complete #32
However, I was able to test the batch ingest and it appears to be working despite the upgrades.
associated error seems to be a result of bad data:
Error:
invalid source file submitted: /tmp/RackMultipart20230815-1921-11xzc97.csv <br>Unknown column `` Unable to parse CSV.<br>["/app/samvera/hyrax-webapp/app/services/aapb/batch_ingest/csv_reader.rb:34:in `block in validate_csv_header'", "/app/samvera/hyrax-webapp/app/services/aapb/batch_ingest/csv_reader.rb:33:in `each'", "/app/samvera
Bulkrax appears to be working. I tested CSV and XLM imports. I just need to get appropriate XML/manifest data from the client to test its parser.
Overall, it looks like both hyrax-batch ingest and bulkrax are working with the upgrades. At this point it may be best to continue letting the client use their current workflow because the lift to implement parity in bulkrax would be big. Although eventually I recommend it so that the client can easily stay current with bulkrax and community standards.
Now I will dig into the pipeline to see if I can get our hyrax-batch-ingest branch merged. Unless they tell us to do this, Rob recommends that Drew and his team take this on.
Update:
tldr; they should keep using batch ingest after all. It appears to be working still but I would like them to test to confirm once we provide an environment for them.
My current bulkrax estimate is at least a 13 but it also may be because I don't totally understand it. They have 6 different parsers for every import use case - would we need to do that for bulkrax? I would need more time to study them and note their differences.
The search to export functionality only should be smaller and doable, but I don't see evidence that we've done this before. Perhaps we put it into an individual project (we should ask the team) but I don't see it in bulkrax proper.
Should I spend more time breaking this down or can we move on and save it for another time?
related tickets:
batch ingest Export works and produced this CSV:
From what I can tell so far, upgrading hyrax-batch_ingest's dependencies doesn't seem to negatively effect its functionality.
I would like to have the clients test and confirm this, but I think it's OK and they should continue using it for now as the lift to make parity for bulkrax would be much larger.
Also to add per Rob, getting our https://github.com/samvera-labs/hyrax-batch_ingest/pull/152 merged into main should fall on Drew and his team since main's pipeline is broken (unless they want our devs to spend time trying to resolve this).
cc @jillpe
Summary
This ticket is to explore the current state of things.
ref convo/thread: https://assaydepot.slack.com/archives/C0313NKG2DA/p1692113998641919
Additional Information
https://assaydepot.slack.com/archives/C0313NK6HB6/p1692133959198219 https://github.com/scientist-softserv/dev-ops/issues/729 https://drive.google.com/drive/folders/1yOx1jW4WBXjk3_zAI5CHfen-PDkXoXV9 https://assaydepot.slack.com/archives/C030UPFFP2A/p1689857107484079?thread_ts=1689782401.682349&cid=C030UPFFP2A
Huddle NOTES:
did we implement export for GBH? this would require a bulkrax upgrade link disabled in their UI Miranda doesn't use bulkrax at all. look in current version of bulkrax to see if search based export functionality exists. spin up GBH and see if there's anything in the UI that hints of this functionality. get an estimate of how much it will take to implement this with bulkrax. create a batch ingest export w the found set. import it. ask tim or miranda how this currently works. if it's not broken we may proceed with keeping batch ingest. ie: what all interviews by Jamie Oliver. Be able to search for Jamie oliver, export the results. export a collection of work ids after we figure out which collection of work ids we need.
DOCS
https://drive.google.com/drive/folders/1pSELhG57A7S4Cy1YiARHn0NwfVE0qrNj