umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.42k stars 2.67k forks source link

Media is still "live" after its been sent to the recycle bin #2931

Open sussexrick opened 6 years ago

sussexrick commented 6 years ago

Originally logged as http://issues.umbraco.org/issue/U4-7258 and there are further comments there. The issue still exists in 7.12.2.

Media sent to the recycle bin shouldn't continue to have a accessible URL. Even if all links to a media item are removed from the website it can still be found and viewed via Google, even though authors believe they have deleted the file from the site.

What did you expect to happen?

The url become unavailable, media moved to an inaccessible area of the file system.

What actually happened?

File still available on the saved URL.

nul800sebastiaan commented 6 years ago

Thanks, this is a great feature request! I would hesitate in moving files around:

  1. In order to be able to undo a trashing, you'd need to move them somewhere predictable, which may become guessable and thus still accessible
  2. Moving files around sometimes fails and you would want this to work without fail

It would be good to investigate if it would be possible to make a really (really) well-performing httpmodule that could block downloads of files if they are in the trash. However, this would mean that you'd need to always do a reverse query on each filename, find the media item and make sure it's not trashed. We would love to see if this can be done in a performant way. This also might pave the way for having support for protected media items in the core at some point.

I've marked as "Up for grabs" so that you or someone else coming along could create a pull request for it.

abjerner commented 6 years ago

A HTTP module would be lovely, but a solution should also take custom file system providers into consideration - eg. if the media files are hosted on Azure, a HTTP module won't catch the requests.

So moving the files to a special folder might still be the best option, although the new folder name should not be predictable.

nul800sebastiaan commented 6 years ago

@abjerner Sure! I was under the impression that that's how the Azure Blob Storage Provider worked but I see that there's no HTTP module there. Perhaps it is then best to add good error messages when attempting to trash things with an error occurring (when not able to move 1 or more media items).

Furthermore, maybe it's possible to move images to a directory based on their uniqueId (stored in the umbracoNode table). I don't think this GUID doesn't normally get exposed publically.

Some input from @Shazwazza or @zpqrtbnk would be good here too!

abjerner commented 6 years ago

@nul800sebastiaan IIRC it depends on configuration whether visitors will see the normal media URL or be redirected to the URL in blog storage.

For images, it's also worth noticing that there may be cached versions of the image (eg. when shown through the image cropper) - either locally on disk or in blob storage depending on setup.

sussexrick commented 6 years ago

A comment in the old tracker suggests putting a web.config file in the media folder to deny downloads. I think that could work, implemented via the IFileSystem2 so that other providers like Azure could take a different approach if they need to. The back office could still show the media items by using a different URL, one that checks for an authenticated user then reads the file from the IFileSystem2 rather than a direct download that would be blocked by the web.config.

@nul800sebastiaan I looked at media folder names a year ago for http://issues.umbraco.org/issue/U4-2412 and I think the work to use media GUIDs for the folder names has already been done by @zpqrtbnk in https://github.com/umbraco/Umbraco-CMS/pull/1454, where he says:

"supports a new experimental mode where the media directory depends on the content/media item guid and the property guid (so is deterministic). It cannot be enabled before v8, probably"

That PR talks about the code being in MediaHelper, but I found it in MediaFileSystem.GetMediaPath() (https://github.com/umbraco/Umbraco-CMS/blob/c09cd95eb6a9faa587f81afb79d543652c3f48bd/src/Umbraco.Core/IO/MediaFileSystem.cs) where it's disabled by:

public const bool UseTheNewMediaPathScheme = false

JimBobSquarePants commented 6 years ago

The Azure Provider actually deletes the file outright and doesn't move anything to the recycle bin. I'm actually a little surprised no one has complained about this.

https://github.com/JimBobSquarePants/UmbracoFileSystemProviders.Azure/blob/b4ad41e2a47af2287929b63f70d2009d0564c97f/src/UmbracoFileSystemProviders.Azure/AzureFileSystem.cs#L364

Whatever changes to media storage are implemented we need it to fix https://github.com/JimBobSquarePants/UmbracoFileSystemProviders.Azure/pull/106

Something in 7.6 broke our implementation.

dawoe commented 6 years ago

Hi @JimBobSquarePants

If I remember correctly the DeleteFile method from the FilesystemProvider is only called when a media item is removed from the recycle bin. Which is how it should work, because you can restore items from the media bin and the file should still exist.

Dave

JimBobSquarePants commented 6 years ago

@dawoe Ah yes, that's correct. It's been a while since I wrote the code.

UnsungHeros commented 5 years ago

The file would still be available because it is physically there, so you might rename it to filename.ext.1. As long as you have your web server set to serve a whitelist of extensions, that file will not be served by your webserver. The only code needing to be written would be the code to change the filename when it is 'deleted' and the code to un-delete it.

stevetemple commented 5 years ago

I did a PR a while back (#3568) for this in v7 which didn't make it in. I'm looking at redoing that PR for v8.

Basically my approach was when a file was deleted we rename the file to a filename prepended with a random guid. Then if restored we strip the guid off the front of the filename. This would hide the file, and although it's still technically on the filesystem the path is unguessable. Although this feels a bit hacky I can't think of a better solution that would work with all different media providers/CDNs etc.

Does this approach sound like it'd be worthwhile pursuing or does someone have a better idea?

sussexrick commented 5 years ago

@stevetemple To have the document still available, however unlikely to be found, wouldn't be good enough in the scenario where a document is deleted because it's confidential and shouldn't have been uploaded. See my earlier comment for a suggestion. I don't think you need to plan the approach for all possible providers - just the out of the box one(s) and ensuring there are extension points on the interface that custom providers can use.

Well done for taking it on though!

stevetemple commented 5 years ago

Adding a guid in front won't just make it unlikely but as good as impossible to be found. To create a duplicate, only after generating 1 billion UUIDs every second for the next 100 years, the probability of guessing it would be about 50%. Security by obscurity is probably good enough. If you really wanted to get rid of something confidential I'd suggest users should empty their recycle bin.

So I think this might be a good MVP for this.

With different IFileSystem you don't have much control. So Azure blob storage one redirects you to the blob url at that point if someone has that bookmarked you can't put anything in the way like a filter. In that case I'd suggest adding a MoveToRecycleBin, RecoverFromRecycleBin methods or similar. Then the FileSystem could decide how to do it. Moving to a private blob or folder for example. Think this may have to wait until 9.0 though, unless we could do it as a IFileSystem3!? And we only run that code if the FileSystemProvider supports it.

abjerner commented 5 years ago

@stevetemple I think renaming the file using a GUID or another arbitrary value sounds like a good solution. The security by obscurity is fine in this case if you ask me.

In the case with confidential files uploaded by accident, I believe these should be deleted instead of being moved to the recycle bin. But it may depend on either personal or client preference - but either way, I don't think it something we should try to handle specifically here.

I don't think we should change the current interface, as that would create a breaking change. But perhaps we can create an additional interface for this, and Umbraco can check internally whether a file system provider implements both interfaces. Then the two interfaces could be merged for V9.

I'd say go for it 😉 but perhaps @nul800sebastiaan or @Shazwazza has some thoughts on this?

Preventing visitors from accessing files in the recycle bin does however raise a new question. Should editors (possible only admins) be able to access these files until the files are physically deleted? I think they should, but there may be different ways to handle that 😮

ronaldbarendse commented 4 years ago

There are a lot of ways to get the cached version of the trashed file (like @abjerner already noted in https://github.com/umbraco/Umbraco-CMS/issues/2931#issuecomment-419243885), so just make sure to delete the file from the recycle bin if you don't want it accessible from your website.

With the current GDPR rules, you're even required to report a data breach if the file contains confidential/personal information, so only deleting it isn't enough anyway. Having to also delete it from the recycle bin is then a very simple action to perform...

melvinvanrookhuizen commented 3 years ago

@nul800sebastiaan This issue persists in 8.9, but with some remarks.The image link is both accessible in Recycle bin and even after delete in recycle bin. Tested against cache and several browsers.

1 - Recycle bin : the file can be found under the complete url with anchor and sizes. Changing the size in the url works. Removing al variables from the url also works. The image shows in original size

  1. After delete : the file can only be found with the complete link, including all original variables. When changing sizes or crop in url makes it throw an error. Trying to access the original image without any of the variables, also make the item crash.

In both cases it should not be accessible from outside.

ronaldbarendse commented 3 years ago

@melvinvanrookhuizen Moving the media item to the recycle bin doesn't delete the file, so 1 is correct and expected behaviour. Deleting the media item will remove the original file, so 2 is also correct, but has some unexpected behavior for images: ImageProcessor stores the processed images and these are aggressively cached on the server to minimize disk I/O (e.g. having to check whether the original and the processed/cached images exist).

It might be possible for Umbraco to invalidate/delete the cached ImageProcessor images when the file is deleted or have ImageProcessor add an additional check whether the original image exists (causing a performance hit). When using a cloud based cache (like Azure Blob Storage or S3), the additional check will probably cause an even bigger performance hit. If you're using a CDN, this gets even more complicated, as you'd also need to purge the path of the original image file...

Anyway, you can always remove/clean the ImageProcessor cache manually (App_Data\cache by default for the DiskCache) or wait the configured maxDays if cache trimming is enabled.

nikcio commented 1 year ago

Hello everyone. Isn't it time to solve this problem? 😄 I've read through the issue and what seems to be the common denominator in the solutions discussed is that the file will still be available either in the cache or via the new filename. But can't we add middleware to return 404 on known deleted media files?

I've made a POC of this solution here: https://github.com/nikcio/Umbraco-CMS/commit/384f67c793783bfc6f0b0c629e02d344548bda7c

It works by adding .deleted to the file name and then returning 404 on media requests ending in .deleted making it impossible to fetch the trashed media file. This could maybe be extended to allow backoffice requests to fetch the image but I haven't been able to find a solid way to check if the image request is coming from a logged-in backoffice user. I've tested the solution a bit and it seems to be working but I don't have the time to truly test it and make sure it works in all scenarios so if anyone has spare time feel free to use the POC to create a production-ready implementation and add it to a PR. 😉

Note: The extra middleware has to come before ImageSharp to avoid the ImageSharp cache to fetch the media file.

chrisrandledev commented 2 months ago

There are still users with this issue.

Objectively, I understand that Recycle Bin is just a folder, so perhaps media should be accessible from within, but it would be beneficial if there was an appsetting to change the file extension to .deleted as per nikcio's nice solution.