Closed pascalwengerter closed 2 years ago
~Hint: Seems to fail for Golang version <1.18~ initial suspect was that a Go version bump would resolve it, for me (and others) it still fails on Go v1.18.X
The problem happens here https://github.com/cs3org/reva/blob/edge/pkg/storage/utils/filelocks/filelocks.go#L144
We need to find a solution to suppress the error.
Do we need to always supress the error, or only if we tried to delete a file that does no longer exist?
Do we need to always supress the error, or only if we tried to delete a file that does no longer exist?
From what I witness and the way I understand it the problem occurs when trying to fetch the personal space or upload files into it (also initially when no files are present, but not limited to that)
Hm, I can trigger
2022-05-10T07:08:17Z ERR failed to list storage spaces error="remove /home/vscode/.ocis/storage/users/spaces/so/me-admin-user-id-0000-000000000000/nodes/so/me/-a/dm/in-user-id-0000-000000000000.flock: no such file or directory" filters=[{"Term":{"Id":{"opaque_id":"some-admin-user-id-0000-000000000000!some-admin-user-id-0000-000000000000"}},"type":2}] pkg=rgrpc service=storage-users status={"code":15,"message":"error listing spaces","trace":"00000000000000000000000000000000"} traceid=00000000000000000000000000000000
by sending 50-100 concurrent PROPFINDS to a spaces endpoint.
for f in `seq 1 100`; do curl -o /dev/null -s -w "%{http_code}\n" -k -X PROPFIND 'https://cloud.ocis.test/dav/spaces/1284d238-aa92-42ce-bdc4-0b0000009157$some-admin-user-id-0000-000000000000' -u 'admin:admin' & ; done
Which is the locking issue mentioned in https://github.com/owncloud/ocis/issues/3749#issuecomment-1121408177
But in your case I was stumbling over http: proxy error: dial tcp 127.0.0.1:38525: connect: connection refused
.
@pascalwengerter do you see .flock: no such file or directory
in your log when listing fails?
I posted my logs above and I don't remember seeing any .flock
logs there, happy to help with debugging again later today? @dragotin and @lookacat can also reproduce it IIRC
yes I get this error with the latest master and latest build for osx. For me it fails every time sadly :/
First load of the page after login
2022-05-10T10:00:24+02:00 ERR error initializing metadata client error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:9215: socket: too many open files\"" service=ocis
2022-05-10T10:00:24+02:00 ERR failed to list storage spaces error="xattrs: Unable to lock file for read: could not acquire lock after wait" filters=[{"Term":{"Id":{"opaque_id":"bda94c72-625e-4a4b-80ca-d9c046534c50!bda94c72-625e-4a4b-80ca-d9c046534c50"}},"type":2}] pkg=rgrpc service=storage-users status={"code":15,"message":"error listing spaces","trace":"00000000000000000000000000000000"} traceid=00000000000000000000000000000000
2022-05-10T10:00:24+02:00 ERR error sending get quota grpc request service=graph
2022-05-10T10:00:28+02:00 ERR Could not add default role error="{\"id\":\"go.micro.client\",\"code\":503,\"detail\":\"connection error: desc = \\\"transport: Error while dialing dial tcp 127.0.0.1:9191: socket: too many open files\\\"\",\"status\":\"Service Unavailable\"}" service=proxy
2022-05-10T10:00:28+02:00 ERR failed to get roles for user error="{\"id\":\"go.micro.client\",\"code\":503,\"detail\":\"connection error: desc = \\\"transport: Error while dialing dial tcp 127.0.0.1:9191: socket: too many open files\\\"\",\"status\":\"Service Unavailable\"}" service=ocs userid=bda94c72-625e-4a4b-80ca-d9c046534c50
2022-05-10T10:00:29+02:00 ERR Could not load roles error="{\"id\":\"go.micro.client\",\"code\":503,\"detail\":\"connection error: desc = \\\"transport: Error while dialing dial tcp 127.0.0.1:9191: socket: too many open files\\\"\",\"status\":\"Service Unavailable\"}" service=proxy
2022-05-10T10:00:29+02:00 ERR Could not add default role error="{\"id\":\"com.owncloud.api.settings\",\"code\":502,\"detail\":\"circuit breaker is open\",\"status\":\"Bad Gateway\"}" service=proxy
2022-05-10T10:00:29+02:00 ERR Could not load roles error="{\"id\":\"com.owncloud.api.settings\",\"code\":502,\"detail\":\"circuit breaker is open\",\"status\":\"Bad Gateway\"}" service=proxy
this happens after i reload a couple of times
Reason is the flock issue. Closing this in favor of https://github.com/owncloud/ocis/issues/3757
@lookacat your logs indicate a problem with
Error while dialing dial tcp 127.0.0.1:9215: socket: too many open files
That is a different root cause. could you open a new issue, as https://github.com/owncloud/ocis/issues/222 is closed.
To check the current number of open file descriptors you can use the command in https://github.com/owncloud/ocis/issues/268#issuecomment-690970585
The solution might be as easy as raising the limit of open file descriptors. The microservice architecture opens quite a lot of tcp connections between different ports.
@mmattel I cannot find something on file descriptors / ulimits
Describe the bug
A clear and concise description of what the bug is.
Steps to reproduce
Steps to reproduce the behavior:
Expected behavior
Solid loading of personal space
Actual behavior
Flakiness, both when navigating and when doing full-page reloads
Logs
Success case
Failure case
Setup
oCIS on curren master (today 13:00)