nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
26.74k stars 4k forks source link

Large data files left in S3 object storage after bad uploads #29841

Open blazejhanzel opened 2 years ago

blazejhanzel commented 2 years ago

How to use GitHub

Steps to reproduce

  1. Send files via desktop client or web
  2. Abort sending by internet connection failure (or wait for desktop clients error timeout/mismatch byte count)
  3. Check S3 container content for 500 MB files with binary countent of no sense

Expected behaviour

Server should notice that upload is aborted and should delete this badly uploaded files from object storage.

Actual behaviour

Server leaves objects on object storage (OVH S3) in 500MB files. Cannot clear them using occ files:scan --all and occ files:cleanup.

Server configuration

Operating system: Ubuntu 20.04 LTS

Web server: Apache/2.4.41

Database: MySQL 10.3.31

PHP version: 7.4.3

Nextcloud version: 22.2.0.2

Updated from an older Nextcloud/ownCloud or fresh install: fresh install

Where did you install Nextcloud from: zip file

Signing status:

Signing status ``` No errors have been found. ```

List of activated apps:

App list ``` If you have access to your command line run e.g.: Enabled: - accessibility: 1.8.0 - activity: 2.15.0 - apporder: 0.13.0 - bruteforcesettings: 2.2.0 - circles: 22.1.1 - cloud_federation_api: 1.5.0 - comments: 1.12.0 - contacts: 4.0.6 - contactsinteraction: 1.3.0 - dashboard: 7.2.0 - dav: 1.19.0 - deck: 1.5.5 - federatedfilesharing: 1.12.0 - federation: 1.12.0 - files: 1.17.0 - files_external: 1.13.0 - files_pdfviewer: 2.3.0 - files_rightclick: 1.1.0 - files_sharing: 1.14.0 - files_trashbin: 1.12.0 - files_versions: 1.15.0 - files_videoplayer: 1.11.0 - firstrunwizard: 2.11.0 - groupfolders: 10.0.0 - logreader: 2.7.0 - lookup_server_connector: 1.10.0 - nextcloud_announcements: 1.11.0 - notes: 4.2.0 - notifications: 2.10.1 - oauth2: 1.10.0 - password_policy: 1.12.0 - privacy: 1.6.0 - provisioning_api: 1.12.0 - quota_warning: 1.11.0 - recommendations: 1.1.0 - serverinfo: 1.12.0 - settings: 1.4.0 - sharebymail: 1.12.0 - support: 1.5.0 - survey_client: 1.10.0 - systemtags: 1.12.0 - tasks: 0.14.2 - text: 3.3.0 - theming: 1.13.0 - twofactor_backupcodes: 1.11.0 - updatenotification: 1.12.0 - user_status: 1.2.0 - viewer: 1.6.0 - weather_status: 1.2.0 - workflowengine: 2.4.0 Disabled: - admin_audit - encryption - photos - user_ldap ```

Nextcloud configuration:

Config report ``` { "system": { "instanceid": "***REMOVED SENSITIVE VALUE***", "passwordsalt": "***REMOVED SENSITIVE VALUE***", "secret": "***REMOVED SENSITIVE VALUE***", "trusted_domains": [ "***REMOVED SENSITIVE VALUE***" ], "datadirectory": "***REMOVED SENSITIVE VALUE***", "objectstore": { "class": "\\OC\\Files\\ObjectStore\\S3", "arguments": { "bucket": "nextcloud", "autocreate": true, "key": "***REMOVED SENSITIVE VALUE***", "secret": "***REMOVED SENSITIVE VALUE***", "hostname": "storage.waw.cloud.ovh.net", "port": 443, "region": "waw", "use_ssl": true, "use_path_style": true } }, "dbtype": "mysql", "version": "22.2.0.2", "overwrite.cli.url": "***REMOVED SENSITIVE VALUE***", "dbname": "***REMOVED SENSITIVE VALUE***", "dbhost": "***REMOVED SENSITIVE VALUE***", "dbport": "", "dbtableprefix": "oc_", "mysql.utf8mb4": true, "dbuser": "***REMOVED SENSITIVE VALUE***", "dbpassword": "***REMOVED SENSITIVE VALUE***", "installed": true, "default_phone_region": "PL", "mail_from_address": "***REMOVED SENSITIVE VALUE***", "mail_smtpmode": "smtp", "mail_sendmailmode": "smtp", "mail_domain": "***REMOVED SENSITIVE VALUE***", "mail_smtpauthtype": "LOGIN", "mail_smtpauth": 1, "mail_smtphost": "***REMOVED SENSITIVE VALUE***", "mail_smtpport": "587", "mail_smtpname": "***REMOVED SENSITIVE VALUE***", "mail_smtppassword": "***REMOVED SENSITIVE VALUE***", "has_rebuilt_cache": true } } ```

Are you using external storage, if yes which one: S3 Object Storage as default nextcloud data storage

Are you using encryption: no

Are you using an external user-backend, if yes which one: Not sure, probably no

Client configuration

Browser: Chromium-based 95.0.1020.53

Operating system: Windows 10, Windows 11, GNU/Linux

Desktop client version: 3.3.6 (Windows)

Logs

Web server error log

Server error log ``` [Mon Nov 22 08:56:47.033194 2021] [access_compat:error] [pid 424412] [client 209.141.34.220:46930] AH01797: client denied by server configuration: /var/www/html/config/getuser [Mon Nov 22 13:53:42.980006 2021] [access_compat:error] [pid 429447] [client 209.141.34.220:49522] AH01797: client denied by server configuration: /var/www/html/config/getuser [Mon Nov 22 16:55:45.673139 2021] [php7:error] [pid 434037] [client 213.231.8.6:56997] script '/var/www/html/wp-login.php' not found or unable to stat, referer: http://***PRIVATE HOSTNAME***/wp-login.php ```

Nextcloud log (data/nextcloud.log)

Nextcloud log ``` Not sure how to get this from S3 ```
NeoTheThird commented 2 years ago

Cannot clear them using occ files:scan --all and occ files:cleanup.

Afaics the files:* commands do not affect object storage as primary storage at all, but it looks like that might be the intended behavior? Maybe changing this (or adding a new command for object storage) would be a potential fix for this, since it would be important to not only prevent new faulty files to appear, but also to get rid of the old ones.

My object storage ballooned to almost four times the size of my users' accumulated used storage due to this issue. To at the very least get rid of some stuff from my object storage, i ran occ trashbin:cleanup --all-users and occ versions:cleanup. That of course does not fix the underlying issue, but it does reduce my hosting bill a little bit (at the cost of some convenience for my users).

otherguy commented 2 years ago

I had the same issue #30762 and wrote a cronjob that cleans up these uploads.

I published it here: https://github.com/otherguy/nextcloud-cleanup

It's extremely simple and for now only works with Scaleway's S3 Object storage and MySQL/MariaDB but I'm happy to accept PRs to make it more versatile. The changes required for Amazon's S3 storage would be minimal.

Scandiravian commented 2 years ago

I've written a python script that does something similar to the cronjob @otherguy has made, but for Minio+Postgres. I made a comment in a related issue with a disclaimer, that I recommend reading before trying it out on you own https://github.com/nextcloud/server/issues/20333

szaimen commented 1 year ago

Hi, please update to 24.0.9 or better 25.0.3 and report back if it fixes the issue. Thank you!

My goal is to add a label like e.g. 25-feedback to this ticket of an up-to-date major Nextcloud version where the bug could be reproduced. However this is not going to work without your help. So thanks for all your effort!

If you don't manage to reproduce the issue in time and the issue gets closed but you can reproduce the issue afterwards, feel free to create a new bug report with up-to-date information by following this link: https://github.com/nextcloud/server/issues/new?assignees=&labels=bug%2C0.+Needs+triage&template=BUG_REPORT.yml&title=%5BBug%5D%3A+

Scandiravian commented 1 year ago

@szaimen Thanks for communicating what you need to move forward with this issue. I appreciate the effort to clean-up the backlog, as I had forgotten about this issue after I fixed the problem that was causing connections to be dropped.

I am not sure if I will have time to reproduce this in the foreseeable future, so for anyone interested in confirming whether this issue is still affecting Nexcloud here's what I think is needed to reproduce the issue:

  1. Spin up Nextcloud, Postgres, and Minio (or another S3 compatible service)
  2. Configure Nextcloud to use S3 as primary storage (relevant docs)
  3. Set a low upload size (10M or similar) in nextcloud/.user.ini (relevan docs)
  4. Create a folder with a single file that is larger than the limit set in step 3
  5. Log in to Nextcloud and delete everything in the default user's files pane
  6. Check that the storage bucket in Minio is now empty (it might be necessary to run garbage collection before the bucket is cleaned up)
  7. Connect the nextcloud-client to the Nextcloud backend
  8. Set the nextcloud-client to sync the folder set in step 4
  9. Confirm that the upload fails through the logs for the nextcloud-client
  10. Stop the nextcloud-client from syncing to the server
  11. Trigger garbage collection
  12. Check the storage bucket in Minio. If it is no longer empty, the bug is still present

I wrote this from memory, so if anyone spots a mistake, let me know and I'll update the steps

otherguy commented 1 year ago

@szaimen is there a changelog that mentions this?

szaimen commented 1 year ago

Ah sorry, closed this by accident. In which nc version did you reproduce the issue?

otherguy commented 1 year ago

Definitely up to 24.x

frittentheke commented 1 year ago

1) Partial and unsuccessful uploads should certainly be recognized and cleaned away from object storage. So this bug is more about the server not aborting the upload to S3 for a chunk / file not received from the client completely, right?

2) When doing multipart uploads (see https://github.com/nextcloud/server/pull/27034) one would usually use a lifecycle policy (https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpu-abort-incomplete-mpu-lifecycle-config.html) to ensure parts of a mutipart upload that is not completed after a certain time frame are deleted

szaimen commented 1 year ago

Hi, please update to 25.0.7 or better 26.0.2 and report back if it fixes the issue. Thank you!

My goal is to add a label like e.g. 26-feedback to this ticket of an up-to-date major Nextcloud version where the bug could be reproduced. However this is not going to work without your help. So thanks for all your effort!

If you don't manage to reproduce the issue in time and the issue gets closed but you can reproduce the issue afterwards, feel free to create a new bug report with up-to-date information by following this link: https://github.com/nextcloud/server/issues/new?assignees=&labels=bug%2C0.+Needs+triage&template=BUG_REPORT.yml&title=%5BBug%5D%3A+

otherguy commented 1 year ago

@szaimen you asked previously to verify on 24 or 25. I have verified it still happens on 24.

Could you link to a PR or Changelog entry since then that should fix it?

HelderFSFerreira commented 1 day ago

Same issue on 29.0.6