mitodl / ocw-studio

Open Source Courseware authoring tool
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

strip storage bucket prefix from file urls in dev #2124

Closed gumaerc closed 4 months ago

gumaerc commented 4 months ago

What are the relevant tickets?

Closes https://github.com/mitodl/ocw-studio/issues/2117

Description (What does it do?)

This PR adds a small block of code to the full_metadata property of the WebsiteContent model. In local development, Minio is used as a replacement for AWS S3. When OCW_STUDIO_USE_S3 is set to true, the django-storages backend will use S3 for things like FileField widgets. When a file is set on the WebsiteContent.file field, the url property will be obtained from AWS using the AWS SDK. When using AWS proper, this url ends up being something like https://ol-ocw-studio-app-qa.amazonaws.com/courses/course-1/file.pdf, for example.

The full_metadata function collects all of the properties of the WebsiteContent to organize into one dictionary for setting metadata on a markdown file representing the content. When setting the file property in metadata, WebsiteContent.file.url is the property that is used, but the path is extracted by using urlparse(file_url).path. The problem in local dev is that Minio's URLs look like http://localhost:9000/ol-ocw-studio-app/courses/course-1/file.pdf. So, when the path is parsed out, it includes the bucket name as a prefix. In the local dev setup, the various S3 buckets used byocw-studioare exposed by the nginx server usingproxy_pass, for example:proxy_pass http://s3:9000/${AWS_PREVIEW_BUCKET_NAME}/;. So, the file ends up being inaccessible after deployment. The file ends up in the right place on the bucket, but Hugo reads the URL to the file from thefilemetadata property in markdown, which has the unintended prefix. This PR strips out the bucket name prefix from the URL in local dev before returning it infull_metadata`.

How can this be tested?

gumaerc commented 4 months ago

I'm going to close this PR without merging it. After some further investigation, I found that this issue was caused by my experimentation with the new AWS_ENDPOINT_URL environment variable that can be used with the AWS CLI and SDK's like boto3 that globally overrides the AWS endpoint. Without this variable set, the file.url property points to AWS proper, albeit to a file that doesn't exist if you're using Minio, but with the strategy we use where we strip out the domain and just use the relative path it works. The AWS_ENDPOINT_URL env variable was not available at the time we implemented Minio in ocw-studio for local development, but if we decide to refactor to use it at some point we should come back to this as we will need this code.