nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
26.89k stars 4.02k forks source link

Files are not deleted from S3 (primary) #20333

Open solracsf opened 4 years ago

solracsf commented 4 years ago

How to use GitHub

Steps to reproduce

  1. Set S3 as primary storage
  2. Upload, say, 2000 files into a folder (JPGs here)
  3. Delete that folder, and try to empty trashbin

Expected behaviour

Trashbin should be empty correctly

Actual behaviour

After some time, an error appears (Error while empty trash). Reloading the page shows no more files either in the Files or Trashbin.

image

But files still on the Object Storage, here OBJECTS and SIZE:

image

Before these test operations (upload, delete...)

image

Following commands had been executed (after):

sudo -u testing php occ files:scan test

Starting scan for user 1 out of 1 (test)
+---------+-------+--------------+
| Folders | Files | Elapsed time |
+---------+-------+--------------+
| 0       | 0     | 00:00:00     |
+---------+-------+--------------+
sudo -u testing php occ files:cleanup

0 orphaned file cache entries deleted
sudo -u www-data php occ trashbin:cleanup --all-users

Remove deleted files for all users
Remove deleted files for users on backend Database
   test

One user has reported that interface show he is "using" 1,9Gb of storage, but he has NO FILES or FOLDERS at all, either in FILES or TRASHBIN in a production instance.

Server configuration

Operating system: Ubuntu 18.04

Web server: Nginx 17

Database: MariaDB 10.4

PHP version: 7.3

Nextcloud version: (see Nextcloud admin page) 18.0.3

Updated from an older Nextcloud/ownCloud or fresh install: Fresh install

Where did you install Nextcloud from: Official sources

Signing status:

Signing status ``` No errors have been found. ```

List of activated apps:

App list ``` Enabled: - accessibility: 1.4.0 - admin_audit: 1.8.0 - announcementcenter: 3.7.0 - apporder: 0.9.0 - cloud_federation_api: 1.1.0 - dav: 1.14.0 - external: 3.5.0 - federatedfilesharing: 1.8.0 - files: 1.13.1 - files_accesscontrol: 1.8.1 - files_automatedtagging: 1.8.2 - files_pdfviewer: 1.7.0 - files_rightclick: 0.15.2 - files_sharing: 1.10.1 - files_trashbin: 1.8.0 - files_versions: 1.11.0 - files_videoplayer: 1.7.0 - groupfolders: 6.0.3 - impersonate: 1.5.0 - logreader: 2.3.0 - lookup_server_connector: 1.6.0 - notifications: 2.6.0 - oauth2: 1.6.0 - password_policy: 1.8.0 - privacy: 1.2.0 - provisioning_api: 1.8.0 - settings: 1.0.0 - sharebymail: 1.8.0 - theming: 1.9.0 - theming_customcss: 1.5.0 - twofactor_backupcodes: 1.7.0 - viewer: 1.2.0 - workflow_script: 1.3.1 - workflowengine: 2.0.0 ```

Nextcloud configuration:

Config report ``` { "system": { "objectstore": { "class": "\\OC\\Files\\ObjectStore\\S3", "arguments": { "bucket": "testing.example.com", "autocreate": true, "key": "***REMOVED SENSITIVE VALUE***", "secret": "***REMOVED SENSITIVE VALUE***", "hostname": "10.1.0.2", "port": 8080, "use_ssl": false, "region": "fr-par", "use_path_style": true } }, "log_type": "file", "logfile": "\/var\/log\/nextcloud\/testing.example.com-nextcloud.log", "passwordsalt": "***REMOVED SENSITIVE VALUE***", "secret": "***REMOVED SENSITIVE VALUE***", "trusted_domains": [ "testing.example.com" ], "datadirectory": "***REMOVED SENSITIVE VALUE***", "dbtype": "mysql", "version": "18.0.3.0", "overwrite.cli.url": "https:\/\/testing.example.com", "dbname": "***REMOVED SENSITIVE VALUE***", "dbhost": "***REMOVED SENSITIVE VALUE***", "dbport": "3306", "dbtableprefix": "oc_", "mysql.utf8mb4": true, "dbuser": "***REMOVED SENSITIVE VALUE***", "dbpassword": "***REMOVED SENSITIVE VALUE***", "dbdriveroptions": { "1009": "\/etc\/ssl\/mysql\/ca-cert.pem", "1008": "\/etc\/ssl\/mysql\/client-cert.pem", "1007": "\/etc\/ssl\/mysql\/client-key.pem", "1014": false }, "installed": true, "skeletondirectory": "", "default_language": "fr", "default_locale": "fr_FR", "activity_expire_days": 30, "auth.bruteforce.protection.enabled": false, "blacklisted_files": [ ".htaccess", "Thumbs.db", "thumbs.db" ], "htaccess.RewriteBase": "\/", "integrity.check.disabled": false, "knowledgebaseenabled": false, "logtimezone": "Europe\/Paris", "maintenance": false, "memcache.local": "\\OC\\Memcache\\APCu", "memcache.distributed": "\\OC\\Memcache\\Redis", "updatechecker": false, "appstoreenabled": false, "upgrade.disable-web": true, "filelocking.enabled": false, "overwriteprotocol": "https", "preview_max_scale_factor": 1, "redis": { "host": "***REMOVED SENSITIVE VALUE***", "port": 6379, "timeout": 2.5, "dbindex": 2, "password": "***REMOVED SENSITIVE VALUE***" }, "quota_include_external_storage": false, "theme": "", "trashbin_retention_obligation": "auto, 7", "updater.release.channel": "stable", "mail_smtpmode": "smtp", "mail_smtpsecure": "tls", "mail_sendmailmode": "smtp", "mail_from_address": "***REMOVED SENSITIVE VALUE***", "mail_domain": "***REMOVED SENSITIVE VALUE***", "mail_smtpauth": 1, "mail_smtphost": "***REMOVED SENSITIVE VALUE***", "mail_smtpport": "587", "mail_smtpname": "***REMOVED SENSITIVE VALUE***", "mail_smtppassword": "***REMOVED SENSITIVE VALUE***", "instanceid": "***REMOVED SENSITIVE VALUE***", "overwritehost": "testing.example.com", "preview_max_x": "1280", "preview_max_y": "800", "jpeg_quality": "70", "loglevel": 2, "enabledPreviewProviders": [ "OC\\Preview\\PNG", "OC\\Preview\\JPEG", "OC\\Preview\\GIF", "OC\\Preview\\BMP", "OC\\Preview\\XBitmap" ], "apps_paths": [ { "path": "\/var\/www\/apps", "url": "\/apps", "writable": false }, { "path": "\/var\/www\/custom", "url": "\/custom_apps", "writable": true } ] } } ```

Logs are completely empty (we have just fired up a test instance, and test this use case).

Similar to https://github.com/nextcloud/server/issues/17744

SimplyCorbett commented 4 years ago

Your issue is likely filelocking. Disable it in the nextcloud config, disable redis filelocking. Restart PHP-FPM and try reproducing this again.

In my case by disabling filelocking all of my issues related to deletion were resolved. I just let the S3 backend handle the filelocking now.

solracsf commented 4 years ago

Filelocking is already disabled (see my config in the 1st post).

SimplyCorbett commented 4 years ago

Filelocking is already disabled (see my config in the 1st post).

My bad, another foot in mouth moment. If you wait a while are they removed from the backend? Sometimes with S3 deletion is delayed on the backend.

solracsf commented 4 years ago

Thanks but I don't think so as if i upload a 200M file and delete it, i can see it in real time in the S3 backend. And 2h had passed now and files are there (cron is running every 5mn).

SimplyCorbett commented 4 years ago

Thanks but I don't think so as if i upload a 200M file and delete it, i can see it in real time in the S3 backend. And 2h had passed now and files are there (cron is running every 5mn).

Right but with S3 in particular if a file is removed but is locked on the S3 backend it can take a while for it to process the deletions. 2 hours is a fairly long time though.

If you have your S3 provider run the garbage collection process do the files stay or are they deleted?

SimplyCorbett commented 4 years ago

With amazon they also include an option to retain locked objects for x days.

https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lock-overview.html https://aws.amazon.com/blogs/storage/protecting-data-with-amazon-s3-object-lock/

Can you verify that's not the case and garbage collection doesn't resolve the issue?

solracsf commented 4 years ago

I'm not using Amazon but Scaleway, they have Lifecycle Rules but they are disable by default. What do you call GC in S3?

SimplyCorbett commented 4 years ago

I'm not using Amazon but Scaleway, they have Lifecycle Rules but they are disable by default. What do you call GC in S3?

I use radosgw-admin gc process. Each host has their own rules for garbage collection. Do they have a end-user option for garbage collection or an API call to process it?

If not you would need to contact them directly and ask how often it runs and if they can run it now.

SimplyCorbett commented 4 years ago

Edit: Scaleway runs it once a day on their cold S3 storage. I don't know about the other storage options - best contact them about it.

SimplyCorbett commented 4 years ago

I owe you an apology. I don't use nextcloud for images but for file storage. You are correct that files are not being deleted properly when it comes to image previews.

JUVOJustin commented 4 years ago

Same here with wasabi as storage backend. I am having many problems with s3 currently. Maybe there is something more general broken.

solracsf commented 3 years ago

I can always confirm this with v19.0.5. My test instance is completely empty, no files at all, trashbin cleaned, but mc outputs this:

./mc du minio/bucket
2.7GiB

and ./mc ls minio/bucket lists hundreds of files from my different tests. Some of the files were created more than one month ago in the bucket.

These are clearly not images previews as i have big files in the bucket:

./mc ls minio/bucket
...
[2020-11-24 20:16:54 CET] 115KiB urn:oid:50503
[2020-10-07 10:10:30 CEST] 1.1KiB urn:oid:15082
[2020-11-24 20:38:10 CET]  99KiB urn:oid:55762
[2020-10-07 10:09:26 CEST] 5.5MiB urn:oid:14773
[2020-11-24 20:38:09 CET] 192KiB urn:oid:55750
[2020-10-07 09:59:00 CEST]  26KiB urn:oid:11050
[2020-10-07 10:10:27 CEST]   110B urn:oid:15034
[2020-10-06 11:21:30 CEST] 360KiB urn:oid:883
[2020-11-24 20:21:33 CET] 271KiB urn:oid:54307
[2020-10-07 09:59:52 CEST]  25MiB urn:oid:11158
...

Summary: an empty instance, and a bucket with 2.87GB used and 3685 objects in it. 😮

changsheng1239 commented 3 years ago

Yes, can confirmed this issue with Nextcloud 20.0.1 also. My steps:

  1. upload a 500mb files.
  2. cancel the upload at 100mb.

Now my minio bucket is having 10 x 10mb chunked file which should've been deleted.

solracsf commented 3 years ago

I've tested it with AWS S3, to eliminate any compatibility issues with S3 'compatible' providers. Object lock is disabled for this test case.

Problem remains.

Check the bucket stats of the files empty NC instance:

./mc du --versions aws/bucket
2.9GiB

This is a serious problem for many reasons, specially GDPR when users request their files to be deleted and they aren't, beyond the S3 billing for objects we aren't using anymore.

cc @nextcloud/server-triage can someone take a look at this? I believe this affects every ObjectStorage instance but since files are named urn:oid:xxx nobody really knows what files are on their buckets.

disco-panda commented 3 years ago

+1 for GDPR concerns.

It has been almost a year since this issue was brought up and S3 is heavily used in enterprise environments - any ideas when this will be prioritized? Unfortunately, we cannot rely on using S3 for storage if we cannot show that files are completely removed.

caretuse commented 3 years ago

I have a suggestion on this issue.

I know it is hard to sync between files (including filesystem or object storage) and database, not mention about handling cache involed. I believe it is impossible to make database correct after a hardware-based failure, just like a simple power failure. So I suggest there should be a way to check current file list and file information at database.

There is a command occ files:scan to sync between files and database in filesystem based storage condition, but this is not applicable in settings using object as primary storage. I believe every server using object stroage always using a standalone bucket (or directory), so it is safe to clean uncontrolled or unregistered files.

I also appreciate if developers take a look at object server direct download function, an issue had opened at #14675. This function can save our server non-essential bandwitdh, and server loading.

I use object storage (minio) as primary storage because files can be backup easily. I don't need to shutdown Nextcloud server for a long time, and I can seperate database and file server easily. I believe this workout will be widely used in enterprise level, I also wish this suggestion can help Nextcloud server at deployment and migration.

siglun88 commented 3 years ago

I am having the same issue running Nextcloud 21.0.2 with Digital Ocean Spaces (S3) as primary storage. In my case it seems that the issue only occurs when server-side encryption is activated. Although, I haven't tested too much without encryption so I can't be too conclusive.

Also, I agree with @caretuse It would be much appreciated if both of these features could be implemented in some future release

There is a command occ files:scan to sync between files and database in filesystem based storage condition, but this is not applicable in settings using object as primary storage. I believe every server using object stroage always using a standalone bucket (or directory), so it is safe to clean uncontrolled or unregistered files.

I also appreciate if developers take a look at object server direct download function, an issue had opened at #14675. This function can save our server non-essential bandwitdh, and server loading.

jeffglancy commented 3 years ago

Yes, can confirmed this issue with Nextcloud 20.0.1 also. My steps:

  1. upload a 500mb files.
  2. cancel the upload at 100mb.

Now my minio bucket is having 10 x 10mb chunked file which should've been deleted.

I have been running Nextcloud using S3 storage for over two years. I noticed my bucket was bloating early on. Digital Ocean shows my S3 was using 800GB even though my only user had 218 GB of files including versioning. I've been watching this issue for a long time now hoping for a solution, but finally got around to looking into it myself.

I compared the s3cmd la file list output to the oc_filecache database table. I expected to find extra trash files in the S3 bucket. I was perplexed to find that the file list matched perfectly. This was easy to check as the database fileid is the urn:oid:___ number. The file size from the data also matched. This led me to research more about S3 storage.

I finally found that the bloat was from old incomplete uploads. You can list these using the s3cmd multipart s3://BUCKET/ command. S3 allows large uploads to be uploaded as smaller multipart files which are then concatenated when the upload is complete. This helps reduce data transfer in case of interrupted uploads as it can resume near where it left off. It appears neither Nextcloud nor S3 storage is set to delete old incomplete multipart uploads by default. You can remove each file set individually using s3cmd abortmp s3://BUCKET/FILENAME UPLOAD_ID.

Nextcloud could remove old multipart data if it kept track of them. But S3 has the ability to do so on its own. Using s3cmd you upload an XML rule to the S3 bucket: s3cmd setlifecycle lifecycle.xml s3://BUCKET/ Where lifecycle.xml is:

<LifecycleConfiguration>
        <Rule>
                <ID>Remove uncompleted uploads</ID>
                <Prefix/>
                <Status>Enabled</Status>
                <AbortIncompleteMultipartUpload>
                        <DaysAfterInitiation>3</DaysAfterInitiation>
                </AbortIncompleteMultipartUpload>
        </Rule>
</LifecycleConfiguration>

This rule will run once a day at midnight UTC according to what I found. After waiting a day my nearly 800 incomplete uploads spanning over two years were gone and my S3 storage now sits at 220GB as it should.

This doesn't appear to be the solution to all of the issues in this thread, but hopefully it helps some. In my case the files marked as trash or versioning in the database were being removed correctly according to the rules I have in Nextclouds config file. I have transactional file locking disabled and encryption is not enabled.

szaimen commented 3 years ago

I suppose this is still happening on NC21.0.4?

Sivarion commented 3 years ago

I suppose this is still happening on NC21.0.4?

I use Nextcloud 22.0.1 and have the exact same problem with Scaleway S3. At this point I have about 35 GB used by my users, but storage is filled with 74 GB.

Edit: Manually running command: ./occ trashbin:clean --all-users has fixed it for me, but I guess problem will return in time.

szaimen commented 3 years ago

Manually running command: ./occ trashbin:clean --all-users has fixed it for me

Looks like the original issue is fixed then.

@acsfer can you still reproduce this on NC21.0.4 or NC22.1.1?

solracsf commented 3 years ago

@szaimen can't help anymore here, we moved away from S3...

krakazyabra commented 2 years ago

I can confirm, problem exists, even after upgrading to latest (22.1.1) version. ./occ trashbin:clean --all-users didn't help In interface I see: 146.7Gb is used, and minio shows 961Gb for this user and his bucket. image

image

ghost commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity and seems to be missing some essential information. It will be closed if no further activity occurs. Thank you for your contributions.

agowa commented 2 years ago

I've a similar issue with my S3 bucket. But for me it's most likely caused by partially failed multipart uploads, where nextcloud didn't clean up the junks it already pushed into the S3 after the upload failed (other issue #29516).

leonbollerup commented 2 years ago

same issue here - basiclly - i suggest everyone avoid using s3 as primary storage unless you want to throw money out the window.

Scandiravian commented 2 years ago

I snooped around the Nextcloud database and it seems that the issue is, that objects uploaded to S3 are not committed to the db until the transfer to S3 is completed. If a transfer is interrupted, then Nextcloud looses track of the object, since no record of ongoing transfers is kept.

A potential fix could be to log ongoing transfers in the database and occasionally do a clean-up if something goes wrong.

Until this is fixed Nextcloud will continue to bloat the bucket, so I've hacked together a python script that cleans up the S3 storage. It doesn't solve any of the open issues using S3 as primary storage - it simply cleans up orphaned objects in the bucket, thereby bringing down the amount of storage used by Nextcloud.

DISCLAIMER: I'm a stranger on the internet providing a script, that requires access to your personal (and probably sensitive data) -> Do not trust strangers on the internet. Please review the code before running it. I'm not responsible if this script destroys your data, corrupts your db, makes your house catch fire, or curse you to step on Lego bricks every time you have bare feet.

Since the issue seems to be caused by the db not being updated until a transfer to S3 is complete, the script might delete objects that have successfully been transferred to S3, but have not yet been recorded in the database, if it's run while a sync is in progress. Therefore you should not run this while a sync is in progress. I repeat Do not run this while a sync is in progress!

I've run/tested this against my own setup (Minio + Postgres) and haven't encountered any issues so far. If you use any other combination of S3 compatible storage and database, you'll need to modify the code to your needs:

gist

jeffglancy commented 2 years ago

Why reinvent the wheel? Look back at my post on July 4 and S3 lifecycle rules. Since then I have had zero issues with S3 storage bloating from NC 20 through 23. (https://github.com/nextcloud/server/issues/20333#issuecomment-873608624)

caretuse commented 2 years ago

@Scandiravian made a good script to solve database not consistent issue, although I believe this should be implemented in occ trashbin:cleanup --all-users, just like NeoTheThird mentioned in #29841.

@jeffglancy and otherguy also made a good script to solve another issue, which is cleanup pending multipart uploads in S3. But lazy as me, I would choose rclone cleanup s3:bucket in rclone document, rather simple and mistake-proof solution.

szaimen commented 1 year ago

Hi, please update to 24.0.8 or better 25.0.2 and report back if it fixes the issue. Thank you!

caretuse commented 1 year ago

I tested some scenario

  1. Delete S3 object with Nextcloud server shut down: Files remain even occ files:scan --all, until delete manual from Nextcloud
  2. Turn off S3 server (minio) with Nextcloud file deleting: Files remain in trash bin (will appear after reloading webpage)

I can't confirm files not shown in Nextcloud scenario, it should manipulate in database level. It goes beyond my interest.

Does anyone have environment to test?

Corinari commented 1 year ago

Hi @szaimen ,

we are currently running Nextcloud 25 and still experience this problem. On one instance, our S3 Bucket shows 209GB of data, while counting the different users quota in NC itself comes to about 55GB. select sum(size/1024/1024/1024) as size_GB, count(*) as anzahl from oc_filecache where mimetype != 2; shows around 203GB of data which is tracked by NC Trashbin is (almost) empty (~2GB)

Occurs on different Instances, which were built with a custom Docker Image.

mrAceT commented 1 year ago

@Corinari

I created a script (S3->local) once upon a time I had trouble with S3.. partially because I found a bug and feared it was S3 related (but wasn't, fixed that one: https://github.com/nextcloud/server/issues/34422 ;) ) later on (partially with creating that migration script), I dared to try to migrate back to S3.. "reversing" that script was quite a challenge.. but I got it working.. With creating that script I built in various "sanity checks".. and I now run my "local->S3" script every now and then to clean up my S3.. and baring a little hiccup every now and then the script rarely needs to clean stuf up..

A few weeks ago I decided to publish it on Github, take a look at: https://github.com/mrAceT/nextcloud-S3-local-S3-migration

PS: I have various users on my Nextcloud, totaling some 100+Gb of data

aurelienpierre commented 5 months ago

I wrote a Python script to delete orphaned S3 objects (among other work-arounds for NC lack of proper S3 support): https://github.com/aurelienpierre/clean-nextloud-s3

tsohst commented 1 month ago

Is there already a real solution from nextcloud? Facing the same problem currently.

@aurelienpierre you're script might help but its not ready for other s3 vendos like OVH Cloud https://github.com/aurelienpierre/clean-nextloud-s3/issues/2

Also scanning 300k objects takes so much time and downloading them costs €€ ;)

thlehmann-ionos commented 3 weeks ago

I can still reproduce this in Nextcloud 29.

Pre upload:

$ du -sm ./minio/data/
6897    ./minio/data/

Upload ~5GB file, abort at ~50%, find:

$ du -sm ./minio/data/
9541    ./minio/data/
agowa commented 2 weeks ago

@tsohst sorry, I don't know. I stopped using nextcloud because of this error years ago.

joshtrichards commented 2 weeks ago

Disclaimer: Work in progress.

Based on my review of this thread it doesn't appear everyone has the same underlying cause (though the symptoms are somewhat similar). Since this issue has fairly broad title, it's likely there is also some overlap with other open Issues (I'll try to review these as time permits and sort some of these out).

Here are the apparent underlying causes I've been been able to identify from this Issue:

Locking (and legal holds) got mentioned, but hasn't seemed to be a factor with anyone here.

I'll also toss a couple others into the list:

Some of these (but not all) could be addressed through some documentation tweaks.

Keep in mind this is work in progress analysis. Here are a couple notes on a couple of the biggies above.

Versioning

Different providers and object store platforms have different defaults. For example, Backblaze has versioning on by default. AWS has it off by default, but when it's turned on versioning of individual objects apparently are hidden by default in their Web UI in some places so it can be easily to miss if they've been turned on through org policy.

Solution: Either turn off versioning or add lifecycle management rules on your S3 platform. Also, the files_versions_s3 app may be of interest: https://github.com/nextcloud/files_versions_s3/

Aborted multipart uploads

Maybe we can do better here, but it's going to take some work to figure that out. On the other hand, lifecycle rules can be made to handle this situation well (and cleanly) from the looks of it.