scientist-softserv / scholarworks

Cal State Hyrax
0 stars 0 forks source link

Epic: Fedora Endpoint Cleanup #29

Open jillpe opened 2 months ago

jillpe commented 2 months ago

Summary

SOW Budget is $3,200 (~16 senior hours) Clockify Project = Cal State Fedora Endpoint Cleanup

This is the epic ticket for the Fedora Endpoint Cleanup Contract.

'SoftServ will delete old Fedora endpoints and their associated files in CSU’s Digital Archives Hyrax application. Removing old Fedora endpoints will benefit the current design of a single production end-point by reducing clutter from previous versions.'

To Do

Investigate the following URLs:

We need to investigate these urls by curling them and checking the response. If the response contains uri's listed by:ldp:contains with a list of URIs, then we need to check the URIs in the list.

cd calstate_devops
./bin/n8ssh prod hyrax_0
curl http://ec2-34-216-129-222.us-west-2.compute.amazonaws.com:8080/fcrepo/rest/your-uri-here

If the response contains large data sets (e.g. over 1000), then we need create a script to delete the data in batches.

If there are just a few URIs, then we can delete them manually Example: curl -X DELETE http://ec2-34-216-129-222.us-west-2.compute.amazonaws.com:8080/fcrepo/rest/your-uri-here

Acceptance Criteria

Old Fedora endpoints and their associated files are deleted in the CSU's Digital Archives Hyrax application

List of Fedora endpoints to be deleted:

aprilrieger commented 1 month ago

Verified each campus's last Thesis had the fcrepo url with http://ec2-34-216-129-222.us-west-2.compute.amazonaws.com:8080/fcrepo/rest/hyrax-ir-prod/...

There were large data sets that required running along script so I set that up to run over the weekend.

I'll update on by Monday morning.

aprilrieger commented 1 month ago

Problem Endpoints

I ran curl, no of the following had any ldp:contains, and when I ran the curl -X DELETE enpoint-url, it completes without error, but when you curl http://ec2-34-216-129-222.us-west-2.compute.amazonaws.com:8080/fcrepo/rest/, the endpoint is still listed as ldp:contains. I expect it to not be listed any longer.

I ran the curl commands and see that it contains an ldp:contains http://ec2-34-216-129-222.us-west-2.compute.amazonaws.com:8080/fcrepo/rest/csu-demo/ws/85/9f/65/ws859f652. When I try to DELETE it, I receive error:

<!doctype html><html lang="en"><head><title>HTTP Status 404 – Not Found</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 404 – Not Found</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> Not Found</p><p><b>Description</b> The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.</p><hr class="line" /><h3>Apache Tomcat/8.5.57</h3></body></html>[ec2-user@ip-172-31-8-174 ~]$

I curl http://ec2-34-216-129-222.us-west-2.compute.amazonaws.com:8080/fcrepo/rest/csu-demo/ws/85/9f/65/ws859f652 and see it's a FileSet, but no ldp:contains and try to DELETE it, I receive error:

<!doctype html><html lang="en"><head><title>HTTP Status 404 – Not Found</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 404 – Not Found</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> Not Found</p><p><b>Description</b> The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.</p><hr class="line" /><h3>Apache Tomcat/8.5.57</h3></body></html>[ec2-user@ip-172-31-8-174 ~]$
aprilrieger commented 1 month ago

Adding http://ec2-34-216-129-222.us-west-2.compute.amazonaws.com:8080/fcrepo/rest/hyrax-ir-dev to the delete list since client confirmed they are no longer using: https://assaydepot.slack.com/archives/C030UPFCDSS/p1726872199685959

Screenshot ![Image](https://github.com/user-attachments/assets/b48e17c4-8fd8-4c1e-9146-52bdfaa1eacb)
aprilrieger commented 1 week ago

Adding comments on the endpoint script running in hyrax_0: https://assaydepot.slack.com/archives/C031E2NGA3B/p1728942568098419?thread_ts=1727719818.905559&cid=C031E2NGA3B

aprilrieger commented 5 days ago

Paired with Rob this afternoon and we cannot find a path forward yet. There are no children showing under the parent collection and we cannot query the collection as it doesn't know about it's children, but the children know who their parents are.

We have reached out to the Samvera and Fcrepo slack communities and are trying to find a path forward. We may have to temper the clients expectations on the completion of the work to be accomplished when they upgrade/move off fedora.

aprilrieger commented 3 days ago

@jillpe I updated David in channel: https://assaydepot.slack.com/archives/C030UPFCDSS/p1729809616673709?thread_ts=1729809397.451179&cid=C030UPFCDSS, of our struggle to delete the final collection endpoint after removing all the data. His main goal was to remove the data, and we have done that. We can close and bill for this work.