sul-dlss / was-pywb

Configuration for Stanford's pywb instance
https://swap.stanford.edu
Other
2 stars 0 forks source link

Changing access rights for Crawl Objects #27

Open edsu opened 2 years ago

edsu commented 2 years ago

When the access rights for a Crawl Object are changed in Argo we would like those changes to be respected by pywb so that the content is World, Stanford only or Dark (unavailable). In #10 we address the issue of similar rights changes to Seed Objects. However to make similar changes to sets of WARC files will involve modifications to the CDXJ indexes themselves (to add or remove entries). It may prove difficult to make the contents of a WARC file only available on campus, since these controls operate at the URL level, and a given set of WARC files could contain may URLs at different sites.

jcoyne commented 2 years ago

Would it be possible to create conflicting access where one crawl has a url like https://example.com and it is "world" and another crawl also includes the same url (e.g. https://example.com) and it is "dark"?

edsu commented 2 years ago

Good point, that is definitely possible. pywb's ACLJ file can also include the timestamp associated with the URL to block. So in theory that could be factored in if we decide we really need pywb to respect access rights changes related to Crawl Objects. At the moment there haven't been given any use cases for access rights changes to Crawl Objects. This issue is mostly here to note that it isn't currently being handled.

lwrubel commented 2 years ago

Iceboxing along with #10.