jschneier / django-storages

https://django-storages.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
2.74k stars 859 forks source link

New behaviour of S3 storage `exists()` breaks current workflow #1430

Open violuke opened 2 months ago

violuke commented 2 months ago

Since upgrading from 1.14.3 to 1.14.4, we're having a problem with this code:

from django.core.files.base import ContentFile
from django.core.files.storage import storages

storage = storages["default"]

file_name = storage.save("example/example.txt", ContentFile(content=b"hello there!"))

if not storage.exists(file_name):
    raise Exception(f"File {file_name} does not exist")

print("All good!")

On 1.14.3, I get the "All good!" print but with 1.14.4, I get the Exception.

Django 5.0.6. I suspect this is a problem with #1381 edit: see below

I've downgraded again for now, but any help would be much appreciated. Thanks to everyone involved for a really helpful package :pray:

violuke commented 2 months ago

This is probably related to #1429 that was posted while I was typing my issue up :smile: The fact that #1422 was not in the changelog is why I wrongly assumed the issue might be related to a different PR. Thank you.

jschneier commented 2 months ago

Thanks for posting. The reason I made this change was two-fold:

  1. A security issue (which I reported) that was patched upstream: https://www.djangoproject.com/weblog/2024/jul/09/security-releases/
  2. The overwrite functionality was added to the upstream FilesystemStorage (part of how I found the issue) https://github.com/django/django/commit/0b33a3abc2ca7d68a24f6d0772bc2b9fa603744e and as part of this they implemented it in terms of exists, see this comment: https://github.com/django/django/pull/18020#discussion_r1590011076

I'm not surprised people were relying on the previous behavior and I frankly find the Django documentation bizarre but I feel a slight bit paralyzed.

saemideluxe commented 2 months ago

Sorry, I am not versatile enough to get the full picture of the security issue. However, if you say those changes are necessary to prevent a potential directory traversal exploit, I am definitely in favor of not reversing this change. However, it would still be good to have a way to a) allow for overwrites b) checking if a file already exists at the same time. Or are you saying that this combination (by its nature) is the vulnerability, per se?

jschneier commented 2 months ago

Sorry, I am not versatile enough to get the full picture of the security issue. However, if you say those changes are necessary to prevent a potential directory traversal exploit, I am definitely in favor of not reversing this change. However, it would still be good to have a way to a) allow for overwrites b) checking if a file already exists at the same time. Or are you saying that this combination (by its nature) is the vulnerability, per se?

Before the upstream fix, yes. Now that Django has patched, no.

The issue, in short, is that if the Storage overrides get_available_name and does not mirror all of the security checks in it (which this library respectively did and did not because the CVE fixes were added later) then calling storage.save(name) with a user supplied name was vulnerable. NOTE that field.save was not vulnerable because we were not overwriting generate_filename (although there were folks via GitHub search who did so).

Also, I still provide support for unsupported versions of Django (though they will be dropped shortly) which did not get the patch.

jschneier commented 2 months ago

The corresponding fix in Django: https://github.com/django/django/commit/fe4a0bbe2088d0c2b331216dad21ccd0bb3ee80d

saemideluxe commented 2 months ago

Okay, thanks for the explanation of the vulnerability.

Am I correct in assuming that one of following solutions would work?

jschneier commented 2 months ago

If the fix in Django is there then the following I think is fine because we are no longer relying on Storage.get_available_name for any of the security requirements.

def get_available_name(self, name, max_length=None):
  # TODO: something with max_length?
  if self.file_overwrite:
    return name
  return super().get_available_name(name, max_length=max_length)
violuke commented 2 months ago

Thanks everyone for the comments and time spent on this :pray:

I think I understand the reasoning behind the change originally, and we don't want a security issue, of course, but I personally feel that this should be reverted (or partially, at least).

In my opinion, it's a bug of this package that I can create a file, then call .exists() and get a False returned value? I should get the right boolean response to match the Django docs. Especially with file_overwrite being set to its default value of True and it not being obviously a setting that impacts if the exists() function works or not.

If people are bothered about having security fixes (which they should be) then they should be updating this package and Django. If they don't do updates properly and continue to use old versions of this package and/or Django, then that's their (ill-advised) choice.

If this change is reverted, then my thinking is that they have the security risk because they haven't also updated Django, as the fix is in Django and as long as they use the latest patch release of a supported version, they'll have the security fix. How do others feel about this logic?

Thank you.

craiga commented 2 months ago

Yeah, the documentation is a bit confusing:

Returns True if a file referenced by the given name already exists in the storage system, or False if the name is available for a new file.

It's not clear what should happen when a file referenced by the given name already exists and the name is available for a new file.

I suppose I'd lean toward emulating whatever happens with a local filesystem, and maybe issue a warning if the user is using an unpatched version of Django.

Another solution would be to add a head method to the S3 storage which implements the old functionality. Possibly with a log message in exists to try and explain what's going on?

I'd be happy to try making a contribution of either of these fixes.

th3hamm0r commented 2 months ago

I just stumbled over this issue too, so I'm also a bit late to the changes in Django, which I'm not real fan of. I think the basic "exists" operation of a storage class should return exactly that information.

But since this change is now part of Django too, I think the best option is to also set file_overwrite to False by default analog to the FileSystemStorage.

We're using wagtail a lot and setting file_overwrite to False is actually required there (see docs), but we've missed it for some other usages... So this default now seems to be the safer option, also because its analog to Django's handling.

jschneier commented 2 months ago

I think we should probably open a bug with Django. If someone has the time, please also to link to this issue.

th3hamm0r commented 2 months ago

Good idea, done: https://code.djangoproject.com/ticket/35604 I hope it is explained clearly, otherwise you are welcome to edit it.

sarahboyce commented 2 months ago

Thank you all for this discussion 👍

From what I understand, #1422 is motivated by https://github.com/django/django/commit/0b33a3abc2ca7d68a24f6d0772bc2b9fa603744e. This aligns the behaviour of django-storages as if the storages defined in this package were inheriting from FileSystemStorage with the new flag allow_overwrite=True (which is added in Django 5.1). This is also motivated by Django discouraging overwriting get_available_name due to security concerns

However, after the security release to Django, having get_available_name to be defined similar to as described by @jschneier should be secure and django-storages can keep the previous behaviour of .exists()

The "side question" is "what does .exists() mean when you can overwrite files" given that the docs here are a little ambiguous

Returns True if a file referenced by the given name already exists in the storage system, or False if the name is available for a new file.

Given the engagement on this discussion around what .exists() should do in the overwrite case, and the recent security patch to Django, it makes sense for FileSystemStorage to also take this approach and overwrite get_available_name but leave the existing exists behaviour. A clarification to the FileSystemStorage docs also make sense

This is roughly my conclusion 👍 I will also write on the ticket

So, in short, I hear you and will work on this. Changes to django-storages are independent to this effort and can be worked on seperately

nitsujri commented 2 months ago

Hi all, I also attempted to upgrade to 1.14.4 and, as well, was hit by: https://github.com/jschneier/django-storages/blob/e74f29bb71b392508350d8db9d4f5c2811d8dae2/storages/backends/s3.py#L582-L583

For us, this behavior based on variable naming is quite confusing because we want both exists() to return True AND file_overwrite to be True.

Doubt it matters, but our use case is uploading compiled web assets to S3.

I'm reading this thread and am very confused (I understand a lot of people are). At the root, is unintuitive that exists() is affected by file_overwrite because those should be independent options?

jschneier commented 2 months ago

Given the way Django is moving there will be another release that partially undoes this change.

berkcoker commented 2 months ago

Seconding the opinion that this should have been included in the changelog. We had this commit break the functionality of our app unexpectedly

mindcruzer commented 2 months ago

This change breaks the ManifestFilesMixin when collecting static files. Before attempting to get the file blob to compute the hash, it first checks if the file exists. If file overwrite is enabled it's not possible for this mixin to function.

sarahboyce commented 2 months ago

https://github.com/django/django/commit/8d6a20b656ff3fa18e36954668a44a831c2f6ddd just landed in Django

I want to thank @jschneier for all his work on the recent Django improvements and security release. I know it's generated extra work (and possibly stress), and a lot of that work is behind closed doors. You've been fantastic and t's very appreciated, thank you 👏 👏 🥇

MaximePETIT-code commented 4 weeks ago

Okay, thanks for the explanation of the vulnerability.

Am I correct in assuming that one of following solutions would work?

Okay, thanks for the explanation of the vulnerability.

Am I correct in assuming that one of following solutions would work?