Open k-popov opened 2 years ago
I think we have the same issue on multiple NC Versions(NC 22.2.3) the latest being NC 24.0.2.
Setup:
It is possible to see, that the files are correctly stored on storage, before this happens.
From the nextcloud.log in debug mode and with 'filelocking.debug' => true
. We see this at the end of the upload wit the 423
you mentioned in the Browser:
{ "reqId": "uDOo0WwTwNoFWaKL7lpH", "level": 1, "time": "2022-06-28T12:17:56+00:00", "remoteAddr": "111111.11111.11111.1111", "user": "john.doe@google.com", "app": "no app in context", "method": "MOVE", "url": "/remote.php/dav/uploads/john.doe@google.com/web-file-upload-d86cd6c9f67b3659df8b97bce3fdc5a4-1656417981137/.file", "message": "\"files/6c1a102c3da4d854a16bff1982a4b470\"(\"home::john.doe@google.com::files/john.doe@google.com/files/3/5GB.file.upload.part\") is locked, existing lock on file: exclusive", "userAgent": "Mozilla/5.0 (X11; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0", "version": "24.0.2.1", "exception": { "Exception": "OCP\Lock\LockedException", "Message": "\"files/6c1a102c3da4d854a16bff1982a4b470\"(\"home::john.doe@google.com::files/john.doe@google.com/files/3/5GB.file.upload.part\") is locked, existing lock on file: exclusive", "Code": 0, "Trace": [ { "file": "/var/www/html/lib/private/Files/Storage/Common.php", "line": 758, "function": "acquireLock", "class": "OC\Lock\MemcacheLockingProvider", "type": "->", "args": [ "files/6c1a102c3da4d854a16bff1982a4b470", 2, "home::john.doe@google.com::files/john.doe@google.com/files/3/5GB.file.upload.part" ] }, { "file": "/var/www/html/lib/private/Files/Storage/Wrapper/Wrapper.php", "line": 610, "function": "acquireLock", "class": "OC\Files\Storage\Common", "type": "->", "args": [ "files/john.doe@google.com/files/3/5GB.file.upload.part", 2, { "class": "OC\Lock\MemcacheLockingProvider" } ] }, { "file": "/var/www/html/lib/private/Files/Storage/Wrapper/Wrapper.php", "line": 610, "function": "acquireLock", "class": "OC\Files\Storage\Wrapper\Wrapper", "type": "->", "args": [ "files/john.doe@google.com/files/3/5GB.file.upload.part", 2, { "class": "OC\Lock\MemcacheLockingProvider" } ] }, { "file": "/var/www/html/lib/private/Files/View.php", "line": 1961, "function": "acquireLock", "class": "OC\Files\Storage\Wrapper\Wrapper", "type": "->", "args": [ "files/john.doe@google.com/files/3/5GB.file.upload.part", 2, { "class": "OC\Lock\MemcacheLockingProvider" } ] }, { "file": "/var/www/html/lib/private/Files/View.php", "line": 2076, "function": "lockPath", "class": "OC\Files\View", "type": "->", "args": [ "/john.doe@google.com/files/3/5GB.file.upload.part", 2, false ] }, { "file": "/var/www/html/apps/dav/lib/Connector/Sabre/Directory.php", "line": 162, "function": "lockFile", "class": "OC\Files\View", "type": "->", "args": [ "/john.doe@google.com/files/3/5GB.file.upload.part", 2 ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Tree.php", "line": 307, "function": "createFile", "class": "OCA\DAV\Connector\Sabre\Directory", "type": "->", "args": [ "5GB.file", null ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Tree.php", "line": 133, "function": "copyNode", "class": "Sabre\DAV\Tree", "type": "->", "args": [ { "class": "OCA\DAV\Upload\FutureFile" }, { "class": "OCA\DAV\Connector\Sabre\Directory" }, "5GB.file" ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Tree.php", "line": 163, "function": "copy", "class": "Sabre\DAV\Tree", "type": "->", "args": [ "uploads/john.doe@google.com/web-file-upload-d86cd6c9f67b3659df8b97bce3fdc5a4-1656417981137/.file", "files/john.doe@google.com/3/5GB.file" ] }, { "file": "/var/www/html/apps/dav/lib/Upload/ChunkingPlugin.php", "line": 94, "function": "move", "class": "Sabre\DAV\Tree", "type": "->", "args": [ "uploads/john.doe@google.com/web-file-upload-d86cd6c9f67b3659df8b97bce3fdc5a4-1656417981137/.file", "files/john.doe@google.com/3/5GB.file" ] }, { "file": "/var/www/html/apps/dav/lib/Upload/ChunkingPlugin.php", "line": 76, "function": "performMove", "class": "OCA\DAV\Upload\ChunkingPlugin", "type": "->", "args": [ "uploads/john.doe@google.com/web-file-upload-d86cd6c9f67b3659df8b97bce3fdc5a4-1656417981137/.file", "files/john.doe@google.com/3/5GB.file" ] }, { "file": "/var/www/html/3rdparty/sabre/event/lib/WildcardEmitterTrait.php", "line": 89, "function": "beforeMove", "class": "OCA\DAV\Upload\ChunkingPlugin", "type": "->", "args": [ "uploads/john.doe@google.com/web-file-upload-d86cd6c9f67b3659df8b97bce3fdc5a4-1656417981137/.file", "files/john.doe@google.com/3/5GB.file" ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/CorePlugin.php", "line": 603, "function": "emit", "class": "Sabre\DAV\Server", "type": "->", "args": [ "beforeMove", [ "uploads/john.doe@google.com/web-file-upload-d86cd6c9f67b3659df8b97bce3fdc5a4-1656417981137/.file", "files/john.doe@google.com/3/5GB.file" ] ] }, { "file": "/var/www/html/3rdparty/sabre/event/lib/WildcardEmitterTrait.php", "line": 89, "function": "httpMove", "class": "Sabre\DAV\CorePlugin", "type": "->", "args": [ { "class": "Sabre\HTTP\Request" }, { "class": "Sabre\HTTP\Response" } ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php", "line": 472, "function": "emit", "class": "Sabre\DAV\Server", "type": "->", "args": [ "method:MOVE", [ { "class": "Sabre\HTTP\Request" }, { "class": "Sabre\HTTP\Response" } ] ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php", "line": 253, "function": "invokeMethod", "class": "Sabre\DAV\Server", "type": "->", "args": [ { "class": "Sabre\HTTP\Request" }, { "class": "Sabre\HTTP\Response" } ] }, { "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php", "line": 321, "function": "start", "class": "Sabre\DAV\Server", "type": "->", "args": [] }, { "file": "/var/www/html/apps/dav/lib/Server.php", "line": 352, "function": "exec", "class": "Sabre\DAV\Server", "type": "->", "args": [] }, { "file": "/var/www/html/apps/dav/appinfo/v2/remote.php", "line": 35, "function": "exec", "class": "OCA\DAV\Server", "type": "->", "args": [] }, { "file": "/var/www/html/remote.php", "line": 166, "args": [ "/var/www/html/apps/dav/appinfo/v2/remote.php" ], "function": "require_once" } ], "File": "/var/www/html/lib/private/Lock/MemcacheLockingProvider.php", "Line": 85, "CustomMessage": "--" } }
Other threads can be found with only one workaround, which is: Disable File locking or upping the Filelock ttl (did not work for us). Redis on Debug says nothing at all.
Some other Threads I've found: nextcloud/server#23615 nextcloud/server#9001
Update:
'part_file_in_storage' => true,
'filesystem_check_changes' => 1,
'filelocking.ttl' => 43200,
didn't change anything...
I've also noticed that desktop client (Linux version) and nextcloudcmd
(nextcloud-desktop-cmd
Ubuntu package name) are somehow able to sync these large files. Aren't these clients using WebDAV? Are they using it differently?
Still exists on 24.0.6
Turns out, that the MOVE Command send by the Browser fails every time exactly after 60 Seconds (seen by developer Tools in the Browser).
Our config was wrong and did not use redis at all. Now that redis is used for the filelocking, i see only entrys after 60 seconds after the Upload. The problem still persists. When I refresh the browser window, the most cases the files shows right away.
I tried:
... no luck so far
The issue with the MOVE Command that fails after 60 Seconds has to be the Ingress (in our case nginx). We added the following annotations to the ingress and checked with the nginx pod in nginx.conf if all the options are applied.
nginx.ingress.kubernetes.io/cors-allow-headers: X-Forwarded-For
nginx.ingress.kubernetes.io/enable-cors: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "0"
nginx.ingress.kubernetes.io/proxy-buffer-size: 225m
nginx.ingress.kubernetes.io/proxy-buffering: "on"
nginx.ingress.kubernetes.io/proxy-connect-timeout: 60s
nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
nginx.ingress.kubernetes.io/proxy-request-buffering: "on"
nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
Which seem to improve the situation and notably the speed of the uploads. But with the caveat that we could not upload files bigger than 10Gbs so far with our config (so surely this config is still not optimal).
However, with the MOVE command from the browser, the Pod is busy and using all its resources for that and Apache is failing to serve the status.php, which resulted in our case in a killed Pod (failing Readiness Probes). As a workaround, we upped the timeout time in the probes. It's unclear why the MOVE takes so many resources, that apache/php construct on the nextcloud container can't handle even an easy curl on status.php. It seems like that one locks out all other processing of PHP code in this time.
sudo redis-cli flushall
Hi, please update to 24.0.9 or better 25.0.3 and report back if it fixes the issue. Thank you!
My goal is to add a label like e.g. 25-feedback to this ticket of an up-to-date major Nextcloud version where the bug could be reproduced. However this is not going to work without your help. So thanks for all your effort!
If you don't manage to reproduce the issue in time and the issue gets closed but you can reproduce the issue afterwards, feel free to create a new bug report with up-to-date information by following this link: https://github.com/nextcloud/server/issues/new?assignees=&labels=bug%2C0.+Needs+triage&template=BUG_REPORT.yml&title=%5BBug%5D%3A+
Hi, my installation was exactly as the above one, with NC 24.0.5, I was experimenting the problem and I stumbled upon this issue.
On 24.0.5 I tried to upload a 2.2GB file, and got the locking error mentioned above. The file was correctly uploaded though (checked: I downloaded it again and checked it matched the original). I upped the locking TTL to 12h (as suggested above), but still the locking error after uploading persists.
As suggested I upgraded my NC to 25.0.3 (Helm chart version 3.5.0), but the problem is still the same: upload a 2.2 GB file, lock error but file successfully uploaded.
As suggested I upgraded my NC to 25.0.3 (Helm chart version 3.5.0), but the problem is still the same: upload a 2.2 GB file, lock error but file successfully uploaded.
Sounds like you are running into some kind of timeout. Can you make sure that all of the mentioned values in https://docs.nextcloud.com/server/latest/admin_manual/configuration_files/big_file_upload_configuration.html are correctly applied?
These are my current values:
php_value upload_max_filesize 5G
php_value post_max_size 5G
php_value max_input_time 3600
php_value max_execution_time 3600
And also:
'filelocking.enabled' => true,
'filelocking.ttl' => 43200
These values seem enough for a 2.2GB upload, right?
It does not seem a timeout to me... the upload finishes successfully (but shows the locking error), and the file appears in the web interface afterwards. Indeed, if I download the file, the content matches the original byte for byte.
Did you also adjust web server or proxy timeouts?
My proxy is a nginx Ingress in Kubernetes, but the timeouts that can be configured are for establishing connections, not for limiting the duration of those connections.
Again, the upload finishes successfully (HTTP wise) so there seems not to be any timeout or maximum data size problems in the data path...?
What we discovered on k8s with almost unlimited timeoutsettings:
This is exactly my failure scenario.
Some more tests on this front: I installed a temporary VM with:
With this setup, the same upload as indicated above (2.2GB) works flawlessly, and shows no lock errors after uploading.
After doing this, I modified the installation above:
I retried the same 2.2GB upload, and again it worked flawlessly, with no locking errors shown after upload.
And another new test: another temporary VM with:
With this setup, the same upload as indicated above (2.2GB) works flawlessly, and shows no lock errors after uploading.
So:
Clues so far point at some specific problem with the Helm/Kubernetes deployment.
One of my team mates suggested it may have something to do with Redis being deployed in cluster mode in Kubernetes? There are several Pods for it, while in a VM or AIO install only a single Redis instance is deployed... I'll test and report back.
New test: deployed in our K8S staging cluster adding redis configuration architecture: standalone
to the Helm chart config. I verified that now only one redis master Pod is deployed instead of the 4 previous ones (1 master and 3 slaves).
The error still happens with single node Redis deployment.
- Rocky Linux 8 latest (RHEL clone)
- PHP 8.1.16 running in FPM mode (this is the same PHP as the one in the Docker image which is used for deployment in K8S)
- Apache
- External MariaDB server
- Vanilla Nextcloud 25.0.3 minimally installed via Webinstaller, with no optimizations (i.e. no opcache, no memcache, stock DB-based locking)
- (No docker, no kubernetes, just plain old "hosted" installation)
With this setup, the same upload as indicated above (2.2GB) works flawlessly, and shows no lock errors after uploading.
After doing this, I modified the installation above:
- Added and configured opcache
- Added and configured local redis server
- Configured Nextcloud memcache using Redis
- Configured Nextcloud locking using Redis
- Fixed all security and performance recommendations in the Admin dashboard (now it shows a green tick)
I retried the same 2.2GB upload, and again it worked flawlessly, with no locking errors shown after upload.
So basically it looks like the docker image or helm chart might have an issue, right?
So basically it looks like the docker image or helm chart might have an issue, right?
Well, according to my tests (see previous comments), the Docker image is fine, it installs and runs with no errors on a regular instance (Rocky VM). Error seems to be specific to K8S deployment.
Error seems to be specific to K8S deployment.
All right. Shall I move over to the helm chart repo then?
All right. Shall I move over to the helm chart repo then?
That's my current line of investigation, yes.
BTW, @Doriangaensslen said that the problem happened also with the DOcker AIO image, but I did not reproduce it there.
Ah, you tried the AIO image instead of the "normal" image? That could still lead to the docker image being the culprit here as AIO is using its own image. Moving to the docker repo then for now.
I did not know there were 2 docker images, now I know, thanks :-)
New test: deployed Nextcloud Helm chart with Redis disabled, and retried the 2.2GB upload.
An error about locking appears, though different from the previous one, and more terse: ""TEST20_3.bin" is locked"
As with the previous errors, if you refresh, the file has been correctly uploaded and appears in the web UI.
So it seems we can discard Redis as the source of the problem, it is somewhere else.
It's also what we experienced. We also tried with the other cashing methods, no luck there.
We noticed restarts, because the readiness probe failed when the file was stiched toghether, Because the pod was busy processing it. But still after resolving that, we still have this problem.
i have the same issue "file locked: exclusive" using iONOS as host.
I have intermittent issues on a "normal" (non-docker, non-k8s) setup, also using Redis. Didn't occur in 22.x, but since I installed 25 I get that every now and then, without meaningful ways to replicate it. It happens often when doing the auto-backup of photos via the iOS app - which is interesting as the files do not exist in the first place.
I am tempted to disable file-locking, as this is almost a single-user instance (well, the other user is a family member, and it's really only for auto-backups of photos). Still, it'd be better if this would help detecting any sort of issue here.
Happy to help, as always.
I am experiencing essentially the same problem, running Nextcloud 27.0.1 directly on an Apache instance, without any containers.
I have kept the upload and post sizes at their low defaults of 50M, due to resource limitations on the server. Files are stored in S3.
Appearing in the log file...
OCP\Lock\LockedException: "<pathname>" is locked, existing lock on file: 3 shared locks
Similarly, shown as a message popup on the browser page...
"\<pathname>" is locked, existing lock on file: 3 shared locks
Hi there, we've the same problem with 27.0.1-apache image. We're using a K8S Cluster hosted at IONOS. We use an nginx-ingress controller in front of the pod with the following annotations:
nginx.ingress.kubernetes.io/cors-allow-headers: X-Forwarded-For
nginx.ingress.kubernetes.io/enable-cors: 'true'
nginx.ingress.kubernetes.io/proxy-body-size: 0m
nginx.ingress.kubernetes.io/proxy-buffer-size: 225m
nginx.ingress.kubernetes.io/proxy-buffering: 'on'
nginx.ingress.kubernetes.io/proxy-connect-timeout: 60s
nginx.ingress.kubernetes.io/proxy-read-timeout: 900s
nginx.ingress.kubernetes.io/proxy-request-buffering: 'on'
nginx.ingress.kubernetes.io/proxy-send-timeout: '1800'
I think this is an issue in the Files client embedded in server
though I admit only can come up with speculation why this is mostly reported with k8s+docker image.
It appears the reason that the client reported in this thread as being okay (desktop
) works is because it does treat 423 as temporary and simply retries:
The ones that fail (web, iOS, Android) don't seem to from a cursory look, but I haven't dug deeply enough say with certainty yet:
Open questions:
Assuming treating 423 as a soft failure is at least one part of the correct answer, changes/confirmation should be coordinated across all clients (other than Desktop).
I am under the impression that iOS will retry up to a point (or at least used to), but recently I found a set of files that were supposed to have been automatically backed up but weren't and I had to force a full photo library upload. Might not be related at all, but since I still see 423 blocking things every now and then I thought I'd chime in.
You may be on the good track wrt storage latency: the (most) problematic installations seem to be on docker, kubernetes and using S3 storage. All of those have higher (or much higher!) latency than local disks.
I hope you can get this sorted out soon. Good luck :-)
I am using lscr.io/linuxserver/nextcloud:latest , I have this error for the first time using the iOS client.
I am using lscr.io/linuxserver/nextcloud:latest , I have this error for the first time using the iOS client.
Seeing this problem with the LinuxServer Docker image too. Not sure if that's just coincidence though. Seeing it on the Windows desktop client too.
This is a bug, not a question or a configuration/webserver/proxy issue.
Wrong. This is a proxy, webserver and (most likely) configuration issue, albeit a subtle one. Manually removing file locking will result in messed up data.
tl;dr — increase web server timeouts and/or get rid of slow storage, at least for caching.
There is a lot to unpack.
There's a Chunked file upload v2: it's a way for NC to upload large files.
It boils down to a simple concept: split big_chungus_16g.bin
into chunks, upload them to any location, then reassemble them into a temporary file on the server side, and then move them to the desired location.
Despite the complexity of the process, the above problem doesn't often manifest itself in «simple» installations; by «simple» I mean a one server, fast enough storage, no bells and whistles.
By default, NC uses a reasonable chunk size (~10M). Even in the worst case, with buffering enabled in the web server (e.g. nginx) AND buffering for FastCGI, a chunk will first end up in PHP's temporary location.
Then that file moves to a write-only cache for uploads (.../UserName/uploads/1234567/00000
, .../UserName/uploads/1234567/00001
, and so on). — as it should be.
Once all the pieces are in place, the client issues the MOVE
verb, signalling the server to
1) reassemble all the pieces from the staging dir for uploads, say .../UserName/uploads/1234567/*
, into a single intermediate file by sequentially copying the contents of the chunks, and
2) move that intermediate file to the final destination.
3) return a successful response to the client.
So where and why can this process be broken?
First and foremost, all that server-side byte shuffling after the MOVE
verb happens in the lifetime of a single request. If anything happens — the upload is screwed. If I/O takes too long (longer than the timeout the web server expects from the FastCGI socket) — upload is screwed. By «screwed» I mean inconsistent state, wasted space — such a PITA for large files, and...
Enter, lock. NC is pretty paranoid about the atomicity of I/O operations, all those intermediate files, chunks and so on are locked by default, and that makes sense: if something messes with intermediate files during upload, then, as you might guess, it gets screwed up in unpredictable ways. So if for some reason reassembling the file took too long, nginx or some other web server will tell the client «504 Gateway Timeout
, lol». There will be no one to release the lock on the intermediate file, it may or may not be moved to the target — upload is screwed.
But there's more. Clients and retry logic. Clients are pretty stupid naїve, and will happily retry uploads even if there are server-side inconsistencies. Again and again, they will stumble upon a held lock until it expires or someone manually deletes it.
I reiterate: overall, this seemingly fragile process works as expected, and locking works as expected. Modern storage can easily deliver 300MB/s sequential I/O, so assembling, say, a 4G file will fit nicely even in default humble 30s timeout. If the NC data folder lives on a medium with scarce IOPS (NFS, Longhorn, whatever), then even trivial operations that normally happen in the boundaries of a disc will become network calls, so moving chunks back and forth will turn uploads of anything larger than 1G into PITA.
To fix this problem and make efficient use of all available bandwidth, one needs to
upload_tmp_dir
— NVMe or even tmpfs if you can afford it for >1Gbps upload speed.fastcgi_send_timeout
/fastcgi_read_timeout
in nginx to a reasonable value, say 600s. And don't forget to disable buffering at all.upload/
directory with symlink to fast local storage ln -s /scratch uploads
in conjunction with localstorage.allowsymlinks
will save precious bandwidth and IOPS.With all this tweaks NC can easily saturate a 1Gbps link, even on modest hardware. It just needs to be configured properly. Unfortunately, the default configuration isn't the best.
P.S. The mobile client defaults to small chunks with no way to configure them. That's why video uploads on 300Mbps LTE are painfully slow.
This is a bug, not a question or a configuration/webserver/proxy issue.
Wrong. This is a proxy, webserver and (most likely) configuration issue, albeit a subtle one. Manually removing file locking will result in messed up data.
tl;dr — increase web server timeouts and/or get rid of slow storage, at least for caching.
There is a lot to unpack.
There's a Chunked file upload v2: it's a way for NC to upload large files. It boils down to a simple concept: split
big_chungus_16g.bin
into chunks, upload them to any location, then reassemble them into a temporary file on the server side, and then move them to the desired location. Despite the complexity of the process, the above problem doesn't often manifest itself in «simple» installations; by «simple» I mean a one server, fast enough storage, no bells and whistles.By default, NC uses a reasonable chunk size (~10M). Even in the worst case, with buffering enabled in the web server (e.g. nginx) AND buffering for FastCGI, a chunk will first end up in PHP's temporary location. Then that file moves to a write-only cache for uploads (
.../UserName/uploads/1234567/00000
,.../UserName/uploads/1234567/00001
, and so on). — as it should be.Once all the pieces are in place, the client issues the
MOVE
verb, signalling the server to
- reassemble all the pieces from the staging dir for uploads, say
.../UserName/uploads/1234567/*
, into a single intermediate file by sequentially copying the contents of the chunks, and- move that intermediate file to the final destination.
- return a successful response to the client.
So where and why can this process be broken?
First and foremost, all that server-side byte shuffling after the
MOVE
verb happens in the lifetime of a single request. If anything happens — the upload is screwed. If I/O takes too long (longer than the timeout the web server expects from the FastCGI socket) — upload is screwed. By «screwed» I mean inconsistent state, wasted space — such a PITA for large files, and... Enter, lock. NC is pretty paranoid about the atomicity of I/O operations, all those intermediate files, chunks and so on are locked by default, and that makes sense: if something messes with intermediate files during upload, then, as you might guess, it gets screwed up in unpredictable ways. So if for some reason reassembling the file took too long, nginx or some other web server will tell the client «504 Gateway Timeout
, lol». There will be no one to release the lock on the intermediate file, it may or may not be moved to the target — upload is screwed. But there's more. Clients and retry logic. Clients are pretty ~stupid~ naїve, and will happily retry uploads even if there are server-side inconsistencies. Again and again, they will stumble upon a held lock until it expires or someone manually deletes it.I reiterate: overall, this seemingly fragile process works as expected, and locking works as expected. Modern storage can easily deliver 300MB/s sequential I/O, so assembling, say, a 4G file will fit nicely even in default humble 30s timeout. If the NC data folder lives on a medium with scarce IOPS (NFS, Longhorn, whatever), then even trivial operations that normally happen in the boundaries of a disc will become network calls, so moving chunks back and forth will turn uploads of anything larger than 1G into PITA.
To fix this problem and make efficient use of all available bandwidth, one needs to
- Tweak runtime: get rid of PHP upload limits, increase input/execution time to 1 hour (overkill, but still), AND use a fast location for
upload_tmp_dir
— NVMe or even tmpfs if you can afford it for >1Gbps upload speed.- Tweak NC itself: use big chunks up to 4GB instead of 10MB, or disable chunking altogether at your discretion;
- Tweak webserver timeouts: reassembling a 16G file can take a few minutes on a consumer grade HDD, so set
fastcgi_send_timeout
/fastcgi_read_timeout
in nginx to a reasonable value, say 600s. And don't forget to disable buffering at all.- In case of NFS baked data directory, there is a dirty hack: replacing
upload/
directory with symlink to fast local storageln -s /scratch uploads
in conjunction withlocalstorage.allowsymlinks
will save precious bandwidth and IOPS.With all this tweaks NC can easily saturate a 1Gbps link, even on modest hardware. It just needs to be configured properly. Unfortunately, the default configuration isn't the best.
P.S. The mobile client defaults to small chunks with no way to configure them. That's why video uploads on 300Mbps LTE are painfully slow.
Thanks for the thoughtful and detailed response, really appreciative of your time and effort!
Just wanted to say that I saw it on a single very small file (a few hundred kilobyte sfx for a game I'm working on) out of a folder of much bigger several GB files. I copy-pasted the file elsewhere then simply deleted the file then added it again and it worked first-time, so not convinced that it's an IOPS problem, think it's a bug of some description. I've been running Nextcloud a few years now and never encountered this issue, even uploading much larger files (like backups/snapshots!).
So whilst I agree with a lot of what you're saying, I don't think it's necessarily applicable to every instance of people seeing file locking. Nothing in the log either that shines any light on it.
Thank you for all the helpful hints. However... I come bearing bad news.
I have NFS-backed storage, yes, it's not the fastest. However: I mounted the uploads/ directory on it's own 6 GB sized tmpfs. It gets used fine, but this doesn't change a thing at all. For me, this means Nextcloud is fundamentally broken for uploading big files via mobile (e.g. Videos) In this example, I managed to clog up my instance with a few Christmas and New Years' videos, 1-2GB in size. For the short term, I'm not syncing movies any more, for the long term I will migrate back to local storage. This however, is a very, very sore wound. And it's not the only one, but those are other bug reports entirely. Just my two cents.
No timeouts in Log, only 405 and 423 (Locked), btw.
I have this issue on a fixed set of rather small files (music files, so 20-100MB), while I had no issues uploading several multi-gigabyte files. I'm running the 28.0.2-apache image in k8s behind ingress-nginx (deployed with nextcloud/helm) with user files on ceph-rgw via S3. I've significantly increased the timeouts on the ingress, but that did not change anything.
also: when I copy one of the affected files, the copy is uploaded without issues.
Deleting the /chunking-v2 keys in redis gets rid of this shared lock issue (which seems odd), but the desktop client still tries to upload all affected files (~200). During that sync new /chunking-v2 keys show up which will lead to failed uploads due to a shared lock once again, however not as many.
I am experiencing the same issue ("file.bin" is locked, existing lock on file: 3 shared locks
) on a Ionos Managed Nextcloud instance (Nextcloud 27.1.5 Enterprise).
I did some tests and I can reliably reproduce this with any file that is larger than exactly 10MB using the web-ui or Linux Client. Up to 10MB it's quite fast but larger files upload very slow, show http 500 responses and eventually lead to the lock error. I also observe that my instance is often very slow even to list the few files that I have stored.
I therefore conclude that @Orhideous's summary is correct in my case. I will get in touch with Ionos support.
. I will get in touch with Ionos support.
@henrythasler Any solution from ionos? i think i have the same issue but only with the windows client, linux works fine.
@DaDirnbocher No, not yet. I talked to a very helpful and technically skilled support guy last week, explained the problem and provided some additional info/logs. I received some feedback yesterday that the "performance of my Nextcloud was increased" and I should try again. Unfortunately not much had changed. UI was a bit less sluggish but the error is still there.
For what it's worth for anyone else, here is a log entry that shows up with the error: lock.json
I should mention that this error only occurs when overwriting an already existing file. Fresh upload always work fine.
I'm beginning to suspect that this issue here is somehow connected to https://github.com/nextcloud/server/issues/6029 where people also report very bad server performance when incoming federated shares are not reachable (as is the case on my instance).
Btw small update from me: I switch from my ceph S3 to cephfs (everything else is the same) and now all issues are gone. To me it seems like the latency of the S3 backend might be too high for some reason. I don't think nextcloud/server#6029 is related.
It seems the nexcloud-client has some influence on the multipart-upload. As a workaround/hack I have set chunkSize=20000000
in nextcloud.cfg
and now my 15MB files sync w/o any error.
⚠️ This issue respects the following points: ⚠️
Bug description
Getting error
"zeros_1gb.new.dd" is locked, existing lock on file: exclusive
when uploading large files (1Gb+) with browser.Setup:
/var/www/html
) stored in RWO volume, 10 pods use this volume on one node.filelocking.debug
set totrue
Logs viewed through Kibana show (pod_name - log_message, notice MOVE request, second lock attempt is done by different pod if it matters):
From browser the process looks like the following:
"MOVE /remote.php"
is done which completes with423
(Locked) response. (I suspect this code making the request).part
suffixWhen using desktop client to sync the same file server-side logs display the same sequence but the client (probably) does retry for this
MOVE
request which succeeds.Steps to reproduce
Expected behavior
Even large files uploaded successfully.
Solution (?): Locking retried on server side when executing the
MOVE
request.Installation method
Official Docker image
Operating system
Debian/Ubuntu
PHP engine version
PHP 8.0
Web server
Nginx
Database engine version
PostgreSQL
Is this bug present after an update or on a fresh install?
Fresh Nextcloud Server install
Are you using the Nextcloud Server Encryption module?
Encryption is Disabled
What user-backends are you using?
Configuration report
List of activated Apps
Nextcloud Signing status
No response
Nextcloud Logs
No response
Additional info
No response