Closed jrochkind closed 1 month ago
Sitemap: https://s3.amazonaws.com/<%= ScihistDigicoll::Env.lookup("s3_sitemap_bucket") %>/<%= ScihistDigicoll::Env.lookup("sitemap_path") %>sitemap.xml.gz
Yep, we're currently delivering a URL the public, including search engines like google, can't actually get to, since it's using direct bucket URL!
Have to change this to something that generates properly, ideally using shrine storages.
Then have to let the sitemap generation succeed by not trying to set public ACL.
Getting an S3 error when our nightly sitemap generation routine runs. https://app.honeybadger.io/projects/58989/faults/109739811
Aws::S3::Errors::AccessDenied: Access Denied
from./configs/sitemap.rb:31
Diagnosis
so the SiteMap was being stored on derivatives bucket… with a public ACL… As a result of #2667 and https://github.com/sciencehistory/terraform_scihist_digicoll/pull/84 the bucket is now set to reject public ACLs so it raises an
the bucket is now set to reject public ACLs… so we can store it without that…. but we need to make sure the sitemap URL we are actually delivering to google and other crawlers is the cloudfront public one that will work!