lookit / lookit-api

Codebase for Lookit v2 and Experimenter v2. Includes an API. Docs: http://lookit.readthedocs.io/
https://lookit.mit.edu/
MIT License
10 stars 18 forks source link

Check/handle large webcam recording files #1404

Closed becky-gilbert closed 2 months ago

becky-gilbert commented 3 months ago

Summary

Recently a researcher attempted to download a very large recording file, which the site was not able to handle and caused a short period of downtime. We have addressed this on the experiment runner side by capping the recording duration. But a more comprehensive solution would be to also check the file size in the lookit-api's download views and handle the case when a single video file is too large.

Implementation notes

Here is the relevant bit of code for downloading single video files. It doesn't look like we do any file size check here, so this is one possible place where we could address this.

The simplest option is just to block the download with an error message saying that the file size is too big, and with an email address to contact. This scenario shouldn't happen often, so it's probably fine for researchers to have to go through the CHS to access large single recording files. This would also alert us to the frequency of this issue.

Another option is to handle this in the same way we do with the 'download all videos' option, which uses the build_zipfile_of_videos task. So maybe we could re-use this method for single-file downloads that exceed some size limit.

Implementation update

@okaycj has pointed out that there is a much better solution here, which is to allow users to download the video directly from the S3 link, instead of getting the whole file from S3 and returning it as our own response. The main benefit of this approach is that we don't need to load the file into the site's memory (thus we don't need to do an arbitrary file size check or risk further out of memory errors).

To implement this, I changed the video attachment view so that it redirects to the video download URL. But one problem with this approach is that the file download only works off-the-shelf for same-origin links. Because we're using a different origin, we also need to set the "Content-Disposition" header to "attachment" (see here). This has to be done when we generate the presigned URL, which is in attachment_helpers.py. Without that header value, the file will not download and will instead open in the browser (i.e. the same as our 'view' option).

Another option is to switch out the URLs in the video "download" and "view" buttons (a tags) inside the templates so that they use the S3 links. That way, instead of first directing users to our StudyResponseVideoAttachment view (with a query parameter for 'download' or 'view'), the download/view buttons would use the S3 links directly. However, this option would bypass the StudyResponseVideoAttachment view entirely, which means that there would be no permissions checks at the point when the user clicks the download/view buttons.

Open question

Users need to pass various permissions checks to reach the two response page views that contain the video download/view buttons and to see those buttons on the page.

  1. Do we need the additional permission checks at the point when the user clicks the video download/view button?
  2. Does it matter if the pre-signed, time-limited S3 links are available on the page when it is rendered on the client side, vs more obfuscated (but still available to the client due to the redirection)?

Depending on the answers to these questions, we could potentially get rid of the StudyResponseVideoAttachment view entirely and put the S3 links directly into the template (other option mentioned above).