ukwa / ukwa-pywb

GNU General Public License v3.0
11 stars 3 forks source link

Allow crawl-time screenshots etc. to be accessed #62

Open anjackson opened 3 years ago

anjackson commented 3 years ago

Ideally, we would access screenshots etc, directly via Wayback. This means using the current screenshot:http://example.org/ URL format. PyWB can be monkeypatched to allow these extended URI schemes, but they the access system doesn't allow the call through (and it's not 100% clear what the ACL systems does with these compound schemes).

But, if that can be worked out, then we can make crawl-time screenshots and related material available from PyWB.

Alternatively, we could do this via ukwa-access-api, but as that refers to PyWB to determine the access rights, it hits the same problem.

anjackson commented 1 year ago

Gah, this is also affecting the video work, as urn:embeds prefixed URLs like urn:embeds:http://www.snp.org/blog/post/2012/feb/scottish-independence-good-england are not allowed through.

anjackson commented 1 year ago

Note that the location for modifications is...

https://github.com/webrecorder/pywb/blob/83b2113be2c2574ec120ba292006d706e3cc3d53/pywb/manager/aclmanager.py#L132-L148

Where we need to strip off prefix schemes prior to canonicalization, but I'll need to get Ilya's sign off on whether I've missed something and it's not such a good idea.