Open 2403905 opened 2 days ago
Do we know if the amount of goroutines in natsjsregistry.Watcher
is going up? It should not as they are only spawned on startup. There should be no new goroutines added there during runtime.
AFAICT we are sometimes creating a new go micro client on the fly, eg. in services/graph/pkg/service/v0/drives.go
:
// ListStorageSpacesWithFilters List Storage Spaces using filters
func (g Graph) ListStorageSpacesWithFilters(ctx context.Context, filters []*storageprovider.ListStorageSpacesRequest_Filter, unrestricted bool) (*storageprovider.ListStorageSpacesResponse, error) {
gatewayClient, err := g.gatewaySelector.Next()
if err != nil {
return nil, err
}
grpcClient, err := grpc.NewClient(append(grpc.GetClientOptions(g.config.GRPCClientTLS), grpc.WithTraceProvider(g.traceProvider))...)
if err != nil {
return nil, err
}
s := settingssvc.NewPermissionService("com.owncloud.api.settings", grpcClient)
_, err = s.GetPermissionByID(ctx, &settingssvc.GetPermissionByIDRequest{
PermissionId: settingsServiceExt.ListSpacesPermission(0).Id,
})
//...
The problem is that the actual RPC call under the hood uses a selector. And all go micro selectors wrap the registry with a cache registry. That will start a noew watcher. Increasing the number of go routinges by one. On every requests that instantiates a now go micro client.
So, for calls to reva we need te make sure not to create a reva gateway client on service startup. But for calls with go micre clients we need to make sure to not recreate them on every request.
🤪
one occurrence fixed with https://github.com/owncloud/ocis/pull/10583
Keeping it open, could be an issue in more places.
@butonic That is awesome!
The number of goroutines is stable and memory usage is low
Describe the bug
A single binary
ocis
setup has high memory usage and the goroutines leak after long-running under the loading.Steps to reproduce
make test-acceptance-api \\nTEST_SERVER_URL=https://localhost:9200 \\nTEST_WITH_GRAPH_API=true \\nTEST_OCIS=true \\nBEHAT_FEATURE=tests/acceptance/features/apiSpacesShares/shareUploadTUS
curl 'http://localhost:9114/debug/pprof/goroutine' > goroutine-running.out
curl 'http://localhost:9114/debug/pprof/goroutine?debug=2' > goroutine-running.log
curl 'http://localhost:9114/debug/pprof/goroutine' > goroutine-running.out
curl 'http://localhost:9114/debug/pprof/goroutine?debug=2' > goroutine-running.log
Expected behavior
The number of goroutines should go down after the load goes down.
Actual behavior
The number of goroutines is growing during the test and should go down after the test stops. But we can see that thousands of goroutines are blocked and never go away. Most of them are
google.golang.org/grpc/internal/grpcsync.(*CallbackSerializer).run
andgithub.com/owncloud/ocis/v2/ocis-pkg/natsjsregistry.(*Watcher).Next
. It leads to the goroutie and memory leak. The amount of time goroutines were blocked, reaching the approximate execution time of theocis
binary. The locks are related to thego-micro/registry cache
andocis-pkg/natsjsregistry Watcher
.goroutine-out.zip
Setup
Please describe how you started the server and provide a list of relevant environment variables or configuration files.
```console OCIS_XXX=somevalue OCIS_YYY=somevalue PROXY_XXX=somevalue ```
Additional context
Add any other context about the problem here.