pulibrary / aspace_helpers

methods and reports to support common SC activities in ArchivesSpace
1 stars 0 forks source link

Automate sending Items to ReCAP #397

Closed regineheberlein closed 10 months ago

regineheberlein commented 1 year ago

User story

@faithc is requesting that we prioritize the automation of the Items to ReCAP workflow. They are planning a large-scale ReCAPping project in the fall.

This involves retrieving top_container records associated with a resource record (i.e., collection record), transforming them to MARC-XML, and sending them to Alma.

We currently do this semi-manually. This has been good enough until now--we've had very few requests for transfers to date.

The manual piece of the semi-manual process is checking that items do not already exist in Alma, since they neither overlay nor get rejected based on duplicate barcodes. The latter was confirmed at the time with @mzelesky. Cathy Weng was surprised to hear this during all-hands and was going to check into it again.

Implementation notes, if any

When we last looked into how to check programmatically that top_container records are indeed new, we thought we could use a combination of lock_version (a counter in ASpace indicating how many times the record has been saved post-creation) and date stamp. However, it turns out records are routinely saved multiple times during creation, so this proved not viable.

@maxkadel proposed checking barcodes in Alma via API as an alternative. There is an API listed here that we could try: GET /almaws/v1/bibs/{mms_id}/holdings/{holding_id}/items

There is also GET /almaws/v1/items?item_barcode={item_barcode}, which redirects more specifically to GET /almaws/v1/bibs/{mms_id}/holdings/{holding_id}/items/{item_pid}

Alternatively, we could try to implement a plugin based off of the HM plugin that sends components.

mzelesky commented 1 year ago

If there would be a few barcodes each day, I could see API calls working. However, as it's a separate API call per barcode, that could be error-prone.

Would a barcode export of certain locations work, if it's done on a schedule? Would it be possible to cache all the barcodes somewhere?

regineheberlein commented 1 year ago

Thanks @mzelesky ! It sounds like more than just a few barcodes per day to me, but I'll let @faithc confirm.

As to caching barcodes, that's what I was thinking, too. There's a few approaches, I think. We could do it in lib-jobs. Or we could do it as part of the ASpace plugin, if we go that way (ASpace has a documents area that allows saving, and the components plugin makes use of it).

Btw I believe our AbID app also uses a caching mechanism (in AbID, not in ASpace).

mzelesky commented 1 year ago

I was thinking of having the data be pulled from Alma on a periodic basis with an Analytics query. It can be placed onto ftp to be picked up. Similar to how we export Bursar fines and ILL renewals.

regineheberlein commented 1 year ago

Nice, thanks @mzelesky

faithc commented 1 year ago

Thanks @mzelesky ! It sounds like more than just a few barcodes per day to me, but I'll let @faithc confirm.

As to caching barcodes, that's what I was thinking, too. There's a few approaches, I think. We could do it in lib-jobs. Or we could do it as part of the ASpace plugin, if we go that way (ASpace has a documents area that allows saving, and the components plugin makes use of it).

Btw I believe our AbID app also uses a caching mechanism (in AbID, not in ASpace).

Thanks, @mzelesky and @regineheberlein. It's possible it may include more than a few barcodes per day, but we're not yet sure what the workflow will be exactly. Jen Meyer is the lead person on this project as her team will likely be doing a majority of the work.

Also, FYI, the project, which is intended to begin in the Fall, is scheduled to take place over the next year and a half or so.

faithc commented 1 year ago

@regineheberlein I appreciate you creating a ticket for this so quickly, but can it be put hold for now until we meet with you and Kevin to get a sense of how long this work might take in order to confirm whether it's necessary for this project or at least whether it's necessary for it to be completed by the time the project begins? Thanks.

regineheberlein commented 1 year ago

@faithc I'm looping in @kevinreiss . I had understood you to say on Slack that you needed this for the fall. I also see you've edited the message, so are you saying this is not in fact needed?

Happy to close the ticket, just checking to make sure that is really what you want since it sounded like a priority when you reached out yesterday.

regineheberlein commented 1 year ago

From Jen:

In order to make more shelf space in the Compartments, we assembled a list of Firestone archival collections (based on the report that Regine and Don put together) that have had 5 or less transactions over the past 10 years that could be sent to ReCAP. This initial batch consists of about 400 collections, many of which are a single box or very few boxes. Right now the curators are reviewing this list to see if any should be vetoed from going to ReCAP. From talking with Faith, it seems we have 3 options during the process of sending collections to ReCAP vis-à-vis Alma item records -- to do them singly in SC, to provide Regine the information to do them with the manual process, or to use a yet-to-be created automatic process. The automatic process isn't key to the project, but it could save a lot of time. However, if this is a heavy ask and would take a long time it complete, it doesn't seem like it would be worth holding up this ReCAP project. We may have additional batch (6-10 transactions in the past 10 years for example) depending on the amount of space cleared by this initial batch. Regine and Will - I hope that gives a bit of context. Do you have any immediate thoughts or would it be better to have a conversation with all of us and Kevin?

faithc commented 1 year ago

Sorry if I was unclear. Per your suggestion, I wanted to reiterate what I had said in our Slack conversation that I agree it would be a good idea to have a meeting with the group to help make a determination about how best to move forward before any work begins. Jen is planning to get in touch. I wanted to confirm that this ticket was meant to serve as a placeholder until we all met.

Knowing now that the work to fully automate this process wasn't slated for completion regardless of this project, I think it would be helpful to check in to see whether it should be prioritized for the project or at least for the start of the project especially given how long the work might take/ how much of a lift it will be for your team. My Slack inquiry was about whether it might be possible to resume the work in light of this project thinking that having a fully automated process in place would likely be ideal for all involved. (I had edited my initial message in Slack almost immediately after posting it to make a minor correction; I didn't edit it recently or edit it to alter the question.)

Does that make sense?

regineheberlein commented 1 year ago

Closing for now--please let us know if you want it reopened!

regineheberlein commented 1 year ago

Re-opening after having had a conversation with Will.

regineheberlein commented 1 year ago

Some preliminary considerations to discuss with DACS and Alma teams:

  1. Alma currently only has item records for SC ReCAP collections, not for everything.
  2. We are currently not sending items as part of the nightly aspace2alma.
  3. To automate this process, we would need to modify the nightly process to include items. Question: would we want to only include the ReCAP locations? Probably.
  4. Because Alma doesn't reject duplicate barcodes that come in as part of an import job, we need to front-load a deduplication process. (Let's triple-check this. It is too stupid to be true.)
  5. The silver lining is that everything that is going to ReCAP must be barcoded, so the deduplication can safely be done on barcode, even if Alma doesn't do it for us.
  6. As @mzelesky suggests, we could run an export job for SC items in ReCAP locations that gets uploaded to lib-sftp. Question: what would be the failsafe in case the items export job or upload to lib-sftp fails? (An outdated file on lib-sftp would result in a duplicate items upload to Alma because it would not reflect items that have been imported to Alma in the meantime.) We need some sort of "burn after reading" mechanism.
  7. The aspace2alma job would then have to grab the file from there first, dedupe on barcode, and send back to lib-sftp only the items not found in Alma. Failsafe could be that it doesn't find a file, it exits.
  8. On the Alma end, we need to set the (currently manual) import job to run on a schedule
regineheberlein commented 1 year ago

Concerns (notes from meeting with @mzelesky ):

  1. will barcodes change before sending to ReCAP?
  2. Stakeholders should meet with Marie to make sure that isn't the case
  3. We will use a publishing profile to get item information from Alma
  4. We will use an import profile to import data to Alma
  5. Preprocessing could involve looking at the published report
  6. For matches that are found, we may need to keep track of the item id
  7. failsafe: when processed, rename the file
  8. should this really be nightly? or weekly? the publishing job runs for 4 hours
  9. how will this project fit into the ReCAPping quotas?
regineheberlein commented 1 year ago

@mzelesky has created a draft publishing job in the Sandbox.

Items to possibly tweak include

(there may be more)

regineheberlein commented 1 year ago

@mzelesky : From @jmeyer4

Response from Marie: "As far as the project goes as long as I am in the loop in regard to amount of carts to send and when, all is good. For the barcode issue, we only change the barcode on items that have NO barcode on front cover. I think Mark was thinking of items that are being withdrawn from ReCAP for one reason or another. For those we routinely change the barcode to not create confusion in the future, but that wouldn’t be the case for your project. Hope this clears that up."

regineheberlein commented 1 year ago

@faithc has confirmed via Slack that this project should move forward

regineheberlein commented 1 year ago

@faithc @Will-Clements

I'm sharing with you a report of ASpace containers with ReCAP locations that don't have corresponding item records in Alma.

I suspect a number of these have outdated barcodes in ASpace (I'm seeing a lot of faux item identifiers in there dating back to the migration). If so, and they are in fact at ReCAP but in Alma under a different barcode, I need you to please update the barcodes in ASpace before we can proceed.

If the containers have the wrong location in ASpace and are actually on site, then I need you to please update the location in ASpace before we can proceed.

If the containers have the correct barcode and location in ASpace and are in fact at or en route to ReCAP, you need to do nothing; an Alma item record will be created for them when we turn on the process.

We can't deploy the automated ReCAP process until all containers on this list have been manually confirmed or corrected. That's because any container on this list will get an item record in Alma once we run the process.

We'll turn the process on as soon as you let us know that it's done.

items_not_in_Alma.xlsx

regineheberlein commented 1 year ago

Deployment is blocked until the items in the above report have been reconciled by SC.