pulibrary / figgy

Valkyrie-based digital repository backend.
Other
36 stars 4 forks source link

Job to ingest a bag of A/V objects #753

Closed escowles closed 6 years ago

escowles commented 6 years ago

A MediaResource object is created for each component id, with the appropriate title (drawn from EAD finding aid) and appropriate files attached.

escowles commented 6 years ago

Sample bag to work with in Google Drive: https://drive.google.com/drive/u/1/folders/1Y2DSmFBSS9h2lYA9yR04cFv9ALLWEi1E

tpendragon commented 6 years ago

Get a new bag from Kelly from the Latin American project.

hackartisan commented 6 years ago

This job should be called in after_create and after_update in the ArchivalMediaCollectionController

hackartisan commented 6 years ago

I clarified with @kellybolding that going forward we can expect the shotlist to be in the data folder of the bag as descriptive.csv.(the sample bag didn't have it there). She has added it to the bag in our drive. We should use this csv to correlate filenames with component ids. (this decision was revised; we'll get the ids from the EAD)

Details on structure of objects: For each component ID, there will be a MediaResource object. For each barcode-with-part (i.e. each side of each recording) there will be a FileSet object that holds the master, intermediate, and access files.

kellybolding commented 6 years ago

@hackmastera I was continuing to think on this after we spoke, and though this shouldn't affect the LA use case, I wanted to also qualify that descriptive.csv (the shotlist) will be in the bag for future projects, but in cases where archivists need to add/change component IDs post-digitization and before ingesting into Figgy (see step 5 in the workflow diagram), Figgy would need to rely on an updated descriptive.csv/shotlist for matching Local IDs (barcode etc.) with Component IDs, not the one in the bag. This may need to be fleshed out more in the workflow diagram because I'm not sure it's clear where Figgy is getting this data from (i.e. should we provide staff an option for uploading a new shotlist upon ingest when it will be different from the one in the bag? In this case, I don't think we would want to actually replace the shotlist in the bag because that would require re-bagging. We could also not include this file in the bag at all and just require staff to upload it in addition to the bag.) I think it makes sense that Figgy would pull the descriptive metadata from the EAD rather than the shotlist, but matching the barcodes to the component IDs may need to always happen from a shotlist since we don't want to continue including file names in the finding aids when it's not necessary for users (i.e. remote access). I think you were getting at this, Anna, but I wasn't thinking through the implications in all scenarios. I'd be happy to chat again later today if needed.

hackartisan commented 6 years ago

@kellybolding thank you for following up. And in fact for the case in hand, we don't actually have the mapping in the bag unless we want to re-bag.

We discussed briefly and decided to pursue looking for barcodes in the EAD and getting the component from there. We'll look in <altformavail> for the data we're currently working with, but know that we will need the option of looking in <dao> in the future.

Future use cases may require uploading the separate barcode-componentID mapping file, but it may alternately make sense to adjust encoding in the script that substitutes ARKs for filenames.