Open danielholmes opened 6 years ago
So I played around with this a bit and found that S3 doesn't encode '
when I upload a file with one in the name although it does encode "
. So I think we're wrong any which way you look at it.
I'm happy to take a fix but it's not obvious to me what exactly the correct behavior is, pull requests of course welcome.
Thanks for looking into this.
So I played around with this a bit and found that S3 doesn't encode ' when I upload a file with one in the name although it does encode "
That wasn't exactly what i was referring to. I was talking about the url that is gotten from an already uploaded file (e.g. in a template). e.g. I currently have a file of name uploads/Tests/Children's Services/square_red.png
(which as you pointed out - isn't encoded when it is uploaded). Tracing the end of the S3Boto3Storage.url
method without a custom domain I get https://s3.ap-southeast-2.amazonaws.com/mybucket.com.au/uploads/Tests/Children%27s%20Services/square_red.png
. But when I use a custom domain of mybucket.com.au
I get: https://mybucket.com.au/uploads/Tests/Children's%20Services/square_red.png
Issuing a GET to foo'.txt
or foo%27.txt
does the exact same thing. Is there any harm in always encoding? In that case we would just switch from filepath_to_uri
to quote
with safe='/'
.
Is there any harm in always encoding?
I can't see any for future users (but I could easily be not thinking of some cases). Definitely could be for past users who were relying on non-encoding for something. I have a little hesitation here since it's a fairly popular package and no one else has brought this issue up. But it seems like the correct behaviour to me to always encode.
(Note: you can use quote without any explicit safe
arg since '/'
is its default value)
Yeah backwards compatibility is a definite thought to keep in mind. My guess is that the vast majority of people don’t have files that contain the other characters in filepath_to_uri.
On Sep 18, 2018, at 1:02 AM, Daniel Holmes notifications@github.com wrote:
Is there any harm in always encoding?
I can't see any for future users (but I could easily be not thinking of some cases). Definitely could be for past users who were relying on non-encoding for something. I have a little hesitation here since it's a fairly popular package and no one else has brought this issue up. But it seems like the correct behaviour to me to always encode.
(Note: you can use quote without any explicit safe arg since '/' is its default value)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jschneier/django-storages/issues/600#issuecomment-422256028, or mute the thread https://github.com/notifications/unsubscribe-auth/ACJB2F_I8FwYHQScH_QqCcDLAGvYYX3kks5ucH5vgaJpZM4WtC_L.
Hi,
I've run in to an issue with the
S3Boto3Storage.url
method when using anAWS_S3_CUSTOM_DOMAIN
(I'm using a CloudFront domain). If I have a file path with an apostrophe (') in it then it isn't url encoded. However if I remove the custom domain then the direct s3 bucket url returned does have the apostrophe encoded.The difference is because you're using filepath_to_uri from Django which doesn't encode apostrophes. But if there's no custom domain you're using the url that the s3 API gives you. If you were to modify this test case and add apostrophes in the file name then you'd see the problem.
So why is this a problem? For the systems I've worked with (including CloudFront) having an unescaped apostrophe in the url still serves the file. The issue for me is in invalidating CloudFront paths. i.e. when a file is changed I'm sending an invalidation request to CloudFront for the file's path. CloudFront only allows you to invalidate url encoded paths. So I can invalidate
/uploads/dan%27s%20file.txt
but it doesn't allow invalidating/uploads/dan's%20file.txt
. Meaning I can't invalidate the urls thatS3Boto3Storage
is providing within the website.My case seems pretty niche, but I imagine there are other cases where not properly url encoding the file name leads to problems. And in any case, the encoding rules should probably be consistent whether using a custom domain or not.
The possible fixes I see are to "fix"
filepath_to_uri
upstream in the django project or to use your own encoding function here. Happy to submit a PR with tests if you think the fix should be internal.