samvera / serverless-iiif

IIIF Image API 2.1 & 3.0 server in an AWS Serverless Application
https://samvera.github.io/serverless-iiif/
Apache License 2.0
69 stars 21 forks source link

Authentication for subset of resources #70

Closed fitnycdigitalinitiatives closed 2 years ago

fitnycdigitalinitiatives commented 2 years ago

Hello,

I have a question about authentication. I understand the serverless-iiif isn't necessarily built out of the box to handle this, but I was wondering if you had any ideas about a way that I could require some sort of authentication for only a subset of resources. It seems straightforward to set up authentication for all resources, but not just for some. So imagine a scenario where most images are publicly accessible, but there's a subset that are embargoed from public view. The first thing that comes to mind is that the lambda function checks with some database or list to see if the requested item is 'public' or 'private' and then uses that to determine if it needs to check for authentication. But I'm wondering if I'm not missing something more obvious, or if that approach would terribly slow down the whole process.

Thanks,

Joseph Anderson

mbklein commented 2 years ago

We (Northwestern University Libraries) already do this in our serverless-iiif based server. Using the CloudFront-enabled version of the server, we use a viewer-request function that looks up the requested ID in an OpenSearch index. Depending on the visibility status of the item in question and the other information from the request (e.g., if it includes a valid login token, or if it comes from an IP address with unrestricted access), it either allows the request to pass through or returns a 403 Unauthorized.

A viewer-request function can also intercept a login request and set the token (via a cookie) that can be used to determine whether a user is logged in.

Our code isn't super straightforward, since we manipulate the request in other ways as well, but you can check it out if you'd like. You may notice some placeholders that look like '${interpolated_variable}'. This is because because Lambda@Edge functions can't use environment variables, so we have to preprocess the lambda to hardcode the appropriate config values during deployment.

We have not noticed any performance issues with this approach, even under heavy load.

fitnycdigitalinitiatives commented 2 years ago

Wow, thanks, this is super helpful!