ucldc / rikolti

calisphere harvester 2.0
BSD 3-Clause "New" or "Revised" License
7 stars 3 forks source link

Turn off the legacy CSphere infrastructure #1026

Open christinklez opened 1 month ago

amywieliczka commented 1 month ago

Strictly in the PAD-DSC account:

Terminated:

Deleted:

Disabled [not deleted]:

Did not delete - currently used by thumbnails.calisphere.org:

Stopped [not deleted]:

amywieliczka commented 1 month ago

Other s3 buckets that can probably be safely deleted:

amywieliczka commented 1 month ago

Waited to turn off

amywieliczka commented 1 month ago

Additional documentation here: https://docs.google.com/document/d/18T655upPe_93W4_026hZhbUb5pXlXngBh8GL6fzInfo/edit

aturner commented 1 month ago

UCI PLODAB / Artist's Books site has been updated, to point to our current thumbnail endpoint (https://thumbnails.calisphere.org/clip/... -> https://calisphere.org/clip/...).

amywieliczka commented 2 weeks ago

AWS Lambda Cleanup:

Lambda: arn:aws:lambda:us-west-2:563907706919:function:async-fetch Log Group: /aws/lambda/async-fetch IAM Role: lambda-fetch-to-s3

Lambda: arn:aws:lambda:us-west-2:563907706919:function:async-file-fetch Log Group: /aws/lambda/async-file-fetch IAM Role: lambda-fetch-to-s3 [already deleted]

Lambda: arn:aws:lambda:us-west-2:563907706919:function:start_textract Log Group: /aws/lambda/start_textract IAM Role: lambda-fetch-to-s3 [already deleted] S3 Bucket: s3://rikolti-public/content_files [already deleted] Textract Output: s3://rikolti/textract/ [already deleted]

Lambda: arn:aws:lambda:us-west-2:563907706919:function:get_textract Trigger: SNS: AmazonTextractPachamama Log Group: /aws/lambda/get_textract IAM Role: lambda-fetch-to-s3 [already deleted] SNS Topic: arn:aws:sns:us-west-2:563907706919:AmazonTextractPachamama Textract Role: arn:aws:iam::563907706919:role/TextractRole

Lambda: arn:aws:lambda:us-west-2:563907706919:function:fetch-metadata Test Events: several, deleted Log Group: /aws/lambda/fetch-metadata IAM Role: lambda-fetch-to-s3 [already deleted]

Lambda: arn:aws:lambda:us-west-2:563907706919:function:CreateGoogleLog Test Events: [None] Trigger: S3: ucldc-logs

CloudFormation: rikolti-sam

Lambda: fetch_metadata IAM Role: rikolti-sam-MetadataFetcherFunctionRole-4V944P7D1YAN Lambda: map_metadata IAM Role: rikolti-sam-MetadataMapperMapPageFunctionRole-Y15L7C59VR40 Lambda: shepherd_mappers IAM Role: rikolti-sam-MetadataMapperShepherdFunctionRole-1HWNLJUEO1RPZ Log Group: /aws/lambda/fetch_metadata

Cloudwatch Logs Cleanup:

Log Group: /aws/lambda/amy-test Log Group: /aws/lambda/async-fetch-test Log Group: /aws/lambda/metadata-mapper-sam-test-MetadataMapperFunction-tSwb2gWZlhs2

amywieliczka commented 2 weeks ago

Still to delete: Lambda: arn:aws:lambda:us-west-2:563907706919:function:sorldumpGlueTriggerOnS3, though this is a pretty nice model for how we were calculating calisphere.org/collection/<id>/metadata pages - it's not currently run, but I'd like to document it a bit more before deleting it. The lambda function itself doesn't do much, there's a Glue Crawler that's configured to, on a weekly basis, crawl a named zip file in s3 representing the current production solr index (this named zip file was replaced as part of the deployment process). Cloudwatch Events specified an EventBridge that was triggered any time there was a state change in a Glue Data Catalog. The lambda function would check if the state change happened specifically to the table representing Solr data in the Glue Data Catalog. If so, then the lambda function would trigger the Glue Job 'metadata_summary'.

s3://ucldc-logs can probably be deleted, s3://ucldc-logs/calisphere/ is a bunch of calisphere logs, and s3://ucldc-logs/google/ are those same logs filtered for Google user agents/Google bots (from a time when we were trying to understand how Google crawled our site to increase index coverage); not sure, though, what s3://ucldc-logs/s3logs/ come from.