uoregon-libraries / newspaper-curation-app

Suite of front- and back-end tools for the curation of digitized newspaper materials
Apache License 2.0
8 stars 1 forks source link

Epic: create batch-editing options #330

Open jechols opened 2 months ago

jechols commented 2 months ago

This ticket outlines what we would like to get done as part of the Scoop MVP in terms of making it easier for non-developers to pull, fix, and reingest batches.

I see a small number of high-level "things" we can probably make happen without violating the NDNP spec or making massive changes to ONI.

Each option could have its own dedicated UI instead of trying to force a general purpose "editor" feature in.

Stuff we have to keep in mind for all situations

The implementation, no matter which situation we see, will differ depending on batch status:

We'll need to present batch information in all cases. For MVP, we only support batches that were generated by NCA, as processing non-NCA batches is a bigger task. If there's time, we could create a batch reader of some kind, but that's probably not likely.

Question: how to handle dark archives? Just make a note that people will have to fix that themselves?

Situations

Issue Removal

Some number of issues need to be removed from a batch, but most are fine. Maybe issues have higher-quality replacements or maybe they need metadata re-entered.

Bulk edit

There is some kind of "search and replace" operation we need to run. It might span multiple batches. There are likely a variety of filters, not just a simple replace of every value matching some search.

Some examples:

These cases make a lot more sense to generate a batch patch rather than trying to pull issues and stuff them back into NCA. Especially the second case, given how MOCs work in NCA.

A batch patch will probably be something we need to standardize in some way. We'll probably want a general-purpose script that reads some kind of list of filters and directives, then finds and fixes batches appropriately. We'll need to document how to apply these on a reingest of data, and make it clear users with archived batches will need to preserve the patches with exactly the same amount of care they preserve their batches.

Delete Batch

An entire batch needs to be pulled and all issues just need to get back into NCA for some reason. Maybe it's a small batch that shouldn't have gone out yet (embargo rules were bad) or maybe the issues all need bulk edits, but in a way that just doesn't work well with whatever "batch patch" we come up with.

This scenario is the most time-consuming for users and would need some warnings. All issues would go back into NCA. They could keep their metadata, or be destroyed, but they're all basically treated as if they never were in a batch. They will have to be rebatched the same way any other issues are.

jechols commented 2 months ago

See #231 for historical plans around the same issue