sul-dlss / pre-assembly

Rails app - prepares objects for assembly workflow and allows discovery report
https://consul.stanford.edu/display/chimera/Automated+Accessioning+and+Object+Remediation+%28pre-assembly+and+assembly%29
Other
1 stars 2 forks source link

Research how to get updates when Accessioning fails / in progress / succeeds for every object in a preassembly job #1225

Closed edsu closed 1 year ago

edsu commented 1 year ago

In #1220 we are going to be sending an email to the Preassembly user when their job successfully completes. Ideally we would email when there is a problem as well. Can we do this?

Possible approaches to consider include (but are not limited to):

justinlittman commented 1 year ago

Begs the question if this should be a more generalized SDR feature (i.e., notify a user when accessioning fails). Made me think of https://github.com/excid3/noticed

edsu commented 1 year ago

A generalized "SDR" feature could possibly work. One wrinkle is that Preassembly is a batch oriented service that can update many DRUIDs as part of a single job, and we probably don't want to send an email for every failed accession, which could result in 10, 100 maybe 1000s of emails?

Also, if the user receives an email saying that a Job resulted in one or more accession failures I think it would be helpful if they could be provided a link to the Preaccession job that failed, so they can address the problem. In order to do this DSA would need to know how to do this. If all they had was a DRUID it might take some effort to figure out which job to look at?

Underneath this I think there is an architectural question of what a discrete application like Preassembly does for itself, and what it should rely on other SDR services for.

Is Preassembly part of SDR?

justinlittman commented 1 year ago

Also for consideration is whether it will be a nuisance for a user to get emails when the problems may have been fixed by retries / Andrew / FR by the time that they see the email.

Makes me again wonder whether some sort of a live view wouldn't be more helpful than a static list via email.

edsu commented 1 year ago

The Job view in Preassembly (linked from the email) will let the user know if Accessioning been successful or not, assuming that the Accessioning Complete event from Rabbit works. This is something that Andrew has noted seems unreliable in H2, so maybe this is an unrealistic hope?

andrewjbtw commented 1 year ago

A generalized SDR feature would require a general SDR application, which doesn't exist. There are only different apps that do different things, with different users. Outside of H2, no one "owns" a particular druid, so you would need app-specific tracking of each users actions because you need to know who did what with which druid in which app in order to determine who to notify after each action.

If someone starts accessioning with Preassembly, notify the Preassembly user. If someone starts accessioning with Goobi, notify that Goobi user. If someone opens and closes within Argo, notify that Argo user. etc.

I'm not opposed to doing all of those things, in fact Goobi has the same problem as Preassembly when it comes to tracking. I could see a general way for apps to get statuses, certainly.

edsu commented 1 year ago

I think @andrewjbtw's comment above provides some useful context for the issues at play here. I'm going to write up a couple directions we can chat about in the storytime today. We may be able to close this after writing up any decisions that come out of the storytime.

edsu commented 1 year ago

At yesterday's storytime it appeared that there was consensus that, given our current architecture, it is more appropriate to get the status of an object from the “system of record” (dor-services-app or workflow-server-rails APIs) rather than Solr, even though Solr is probably the fastest of the three.

Given the place of RabbitMQ in SDR Architecture as a Message Broker (see ADR) it also seems appropriate to listen for messages about accessioning or error events for a given SDR Item.

However, there continues to be uncertainty about the reliability of Rabbit messaging as implemented in the SDR:

While RabbitMQ is proven technology, we seem to lack information (logging, tooling) or adequate time to determine what is going wrong in these situations.

We don't currently have a discrete Rabbit message being sent when there is an error related to accessioning an SDR Item. But in theory this could be added.

So the key question for our work in Preassembly in this work-cycle is:

How should Preassembly keep track of the state of an SDR object so it can notify the user and clean up Globus uploads?

  1. Listen for accessioning complete and error messages from Rabbit.
  2. Poll a REST API (DSA or WorkflowServerRails) on some schedule.
  3. Do both: listen for the existing accessioning complete event and poll for errors.
ndushay commented 1 year ago

Go with Rabbit work.

IFF there are significant missing rabbit messages, then we can ticket the work later to poll workflow service

edsu commented 1 year ago

Remaining Rabbit work to track Accessioning Errors has been ticketed, so this analysis is complete!