Closed alexwlchan closed 1 year ago
If you want to get access to these buckets, you need to raise a ticket with D&T (which will need to go through the CAB process).
I got that access about a week before my final day, so I wasn't able to do any detailed analysis. My guess is that there's nothing here worth keeping and we could drop both the buckets.
Looking in the smaller bucket:
s3://libfs04-wellcome-it-com
├─ LIB_WDL_METADATA
├─ LIB_WDL_OBJECTS
├─ LIB_WDL_WTS
└─ LIB_WDL_WTS01
I found a Slack conversation from March 2020 where Ashley asked Intranda about deleting a similar set of folders from an on-prem file share.
LIB_WDL_METADATA
are 'User' Access folders
, which can be deleted. They were "used to ingest data into SDB and to import metadata records into Goobi". NB Here "SDB" is another name for Preservica, the precursor to the current storage service.LIB_WDL_OBJECTS
is emptyLIB_WDL_WTS
and LIB_WDL_WTS01
look like they used to contain some Goobi config, but since it was all migrated to AWS it can also be deleted.Looking in the bigger bucket:
s3://libfs02-wellcome-it-com
├─ LIB_WDL_ACCESS_PROD (empty)
├─ LIB_WDL_SDB_AMD
├─ LIB_WDL_SDB_DMD
├─ LIB_WDL_SDB_OBJECTS
├─ LIB_WDL_SDB_TEMP
├─ LIB_WDL_SDB_THUMBNAILS (empty)
├─ SIPs
└─ testfolder
In the non-empty folders:
LIB_WDL_SDB_AMD
is something to do with Preservica, but it doesn't look like it contains anything important. There's a bunch of very small (<10 bytes) in there.LIB_WDL_SDB_DMD
is another Preservica folder with a bunch of small XML files.LIB_WDL_SDB_OBJECTS
contains a few born-digital files, possibly actual collection material?LIB_WDL_SDB_STORE001
contains a lot of files, possibly actual collection material?LIB_WDL_SDB_TEMP
– I'm not sure, but the word TEMP
is a pretty strong clue about how much we (don't) care about thisSIPs
contains a few interesting files, possibly actual collection material?testfolder
– again, some random stuff we probably don't care that much aboutThe tl;dr is that these buckets seem to be a mix of Goobi/Preservica storage.
We've already migrated everything out of Preservica and confirmed we got it into the storage service. We could repeat that process, but it feels like overkill.
I’ve run this past Ashley and the team; we’re happy to delete both of these.
I’m going to leave this ticket open until Kate confirms the buckets are gone, so we don’t lose it.
5/10/2023 NP followed up with Kate.
5/10/2023 confirmed deleted by Kate
There are two S3 buckets in the NCW Prod account which were created several years ago as backups of on-premise file shares used by Wellcome Collection (last upload August 2019):
libfs02-wellcome-it-com
(~30TB)libfs04-wellcome-it-com
(120.5GB, 145k objects)Based on an initial analysis, the last time we uploaded anything to these buckets was in August 2019.
Kate W has asked whether we still need these buckets, or whether their contents has now been mirrored to our AWS setup and these backups can be safely deleted. I've done some quick analysis of the buckets, and I think it's fine, but I'd like somebody else to double check my working before I give the all-clear.