Closed aray-wellcome closed 5 months ago
How the Editorial Photography Workflow works at present with TandemVault
(To the best of my knowledge)
Attached are two examples Shoot export 2022-10-25 10-40-27.csv Shoot export 2022-10-04 10-58-35.csv
The zipped package is uploaded to the S3 bucket /wellcomecollection-workflow-upload/editorial
where a Lambda triggers and sends the zip to Goobi
Goobi then runs the item through the Editorial Photography Workflow Within this workflow the metadata from the shoot export csv is put on the images (I think?) and jpgs are made of the TIFFS.
The jpgs are packaged up and sent to TandemVault where an upload set is created based on the title and reference from the shoot export csv. All filled out headers from the shoot export csv should be applied in TandemVault in this section
The shoot export csv and master TIFFS are sent to the S3 bucket wellcomecollection-editorial-photography
and stored under folders based on the last two digits of the EP number
Updating TandemVault
There is a second workflow in Goobi that is to be used to update an upload set in TandemVault. The procedures are the same as above but if a Goobi process already exists with the same EP number reference, the item will be sent to the Editorial_Photography_Update workflow which will overwrite the images in the upload set in Tandem Vault and in the S3 wellcomecollection-editorial-photography
bucket.
However, this workflow almost never works. I have to delete the items so a fresh ingest can go through Editorial_Photography instead
Deleting Editorial Photography ingests in TandemVault
If a fresh upload must be made, I delete:
-the upload set from TandemVault from the Upload Set page
-the folder of images and metadata from wellcomecollection-editorial-photography
-the process in the Editorial_Photography workflow in Goobi
We had been planning to look at our Editorial Photography workflow for a while but with MediaGraph appearing, we kind of dived in to making changes along the whole workflow.
Two kinds of work in the Editorial_Photography Workflow
Essentially, the Editorial_Photography workflow in Goobi serves two steams of work:
Ad hoc digitization - we offer free digitization services to enquirers that need something that we haven't digitized yet. It's small amounts of work but takes a lot of resources from the team. If there is something that is requested for digitization that cannot go on Wellcome Collection's site (because of copyright or only part of a book was digitized) this would go into TandemVault. The enquirer would be delivered a copy of the files in TandemVault via a Lightbox
Editorial photography - we have an in-house photography team that takes photos of Wellcome events and staged photoshoots for marketing and Wellcome Stories. They store these shoots in TandemVault for easy of colleagues being able to access the shoots to use for various things as well as a way to archive them.
Changes
Right now both ad hoc photography and editorial photography shoots use an EP number that is generated by our LightBlue shoot software. But ad hoc digitization orders will mostly like be moved out of LightBlue and into a new system, Quickbase (still being built and tested). It's too difficult to continue using the EP numbering convention in another system so ad hoc digitization ingests will most likely be using an AH_00XXXX convention number.
Editorial Photography should continue using the EP numbering convention even if moved to a new software.
In TandemVault, both digitized and editorial photography shoots are put into Upload Sets together.
The only way you can tell that something is a ad hoc digitized item or an editorial photography shoot is to click on the Upload set and then a photo.
Ad Hoc Digitization shoots have a tag that says Digitisation
Editorial Photography shots have a tag that says WEP (Wellcome Editorial Photography)
This has worked fine by us but MediaGraph no longer has Upload sets. Instead it has the File Vault. Our items from TandemVault were migrated to MediaGraph for us but we quickly realized that the way it was imported wasn't working for us. The File Vault had everything arranged by who imported the items. The vast majority of imports were under Intranda's name and it basically made a giant folder that was unopenable.
We have since manually re-arranged the File Vault into CP items (Corporate Photography, a legacy project from years ago and shouldn't have anything added to it), EP - Digitisation Requests, and EP - WEP (Wellcome Editorial Photography)
When Goobi starts sending ingests in via the API, we need to have a way for Goobi and MediaGraph to work together to put the ingests in new folders, most likely called something like Ad Hoc Digitistation and Editorial Photography Shoots, or something like that. In this way we'll avoid having a folder we can't open, at least for a while. It may be the case that we have to switch up the folder in the future as it fills us and slows the functionality.
I suppose that they can be sorted by number, either AH or EP, or by the tags that get applied via the shoot export.csv in Shoot Type
The EP/AH workflows should both still send jpgs to MediaGraph as they're smaller and easier to store. The original TIFFs should be sent to wellcomecollection-editorial-photography
as usual for now. But it may be that we want to migrate all images from wellcomecollection-editorial-photography
into the Wellcome Storage Service to sit in a space alongside wellcomecolletion-storage/digitised
. This means the new EP/AH workflow would need to be able to bag up the master TIFFs and metadata into a bag and store it after sending the JPGs in.
But as I said, this is still just something we're looking at, nothing is scheduled to happen as of me writing this. I just wanted to give a heads up so we can look at building the flexibility to do this in now if we need to.
How MediaGraph workflows should work
Or at least what I think so far...
An Editorial Photography order comes through on LightBlue and gets an EP number/ an Ad Hoc order comes into Quickbase and gets an AH number
Once the job is completed, the EP images get zipped up with a shoot export.csv from LightBlue/ the AH images get zipped up with a shoot export.csv from Quickbase (I can make Quickbase match LightBlue's. But I'm unsure if the API requires any headers/values to be changed to get the same information we need in MediaGraph)
An EP zip/an AH zip is uploaded into wellcomecollection-upload-workflow/editorial
(might need to make this a more generic title? Or leave as is) and the Lambda sends it to Goobi
Goobi's workflow that sends things to MediaGraph handles the EP/AH item (I think that one workflow for this stuff should still work?). The workflow should still embed all metadata, make jpgs and send them to MG, and send the masters and shoot csv to S3 (likely still just the editiorial-photography
bucket at this stage)
The EP item is received by MediaGraph and is put into the File Vault under Editorial Photography shoots with a folder title that has the EP number + the title from the shoot export.csv like so
An AH item is received by MediaGraph and is put into the File Vault under Ad Hoc Digitisation with a folder title that has the AH number + the title from the shoot export.csv
editiorial-photography
bucket are stored under a folder based on their last two digits as it is currently Updating I have no idea how updating things with the new API and MediaGraph should work but I think we should still have the ability to do so. It might have to work in the same way as the old TandemVault update workflow but only if it's more stable
Deleting We do have things that we have to delete sometimes, either for reingest or someone's asked for a shoot they're in to be deleted. I assume we could delete a folder in FileVault, S3 and Goobi as we do now?
Additional thoughts: Though most of the stuff we put through is going to be TIFFs, sometimes mp4s are put in as well so we should be able to handle those too.
TandemVault has been allowing us to test their new product MediaGraph, which is an updated version of TandemVault. Our instance is at https://mediagraph.io/wellcome
MediaGraph has their new API docs here https://docs.mediagraph.io/
For the moment, we will need both APIs (one to MediaGraph, one to TandemVault) working as we continue to test Mediagraph.
The MediaGraph API should be used nearly the same as TandemVault's but I think we need a few changes based on other things happening with this workflow outside of Goobi
In MediaGraph, normally all uploads from Goobi would be dumped into one Uploads folder. This won't work for us as the folder would get too large to open properly so the MediaGraph devs want to use the metadata we send in (Shoot type) to decide what folder it should go in. I know Shoot Types for Ad Hoc will say Digitisation. Shoot types like Editorial, Events, Exhibitions, Objects, Portraits would go into the Editorial Photography folder.
I've been in touch with MediaGraph and they thing the best way to work through this would be for you to contact them and you can work together. "Please feel free to have the Intranda team reach out directly to our CTO, Nick Merwin: nick@MediaGraph.io. He will work with them to make sure assets ingest into the correct storage folders in MediaGraph.