owncloud / ocis

:atom_symbol: ownCloud Infinite Scale Stack
https://doc.owncloud.com/ocis/next/
Apache License 2.0
1.36k stars 176 forks source link

Ocis stops working after some minutes #9650

Closed micbar closed 1 month ago

micbar commented 1 month ago

Describe the bug

Ocis freezes up after some minutes

Using ocis master

Doesn't happen with owncloud/ocis-rolling:6.1.0

Steps to reproduce

  1. Start ocis master in docker compose
  2. Wait some minutes

Expected behavior

ocis runs normally

Actual behavior

freezing ocis

Jul 19 06:45:46 ubuntu-4gb-nbg1-1 docker[1367715]: {"level":"error","service":"users","error":"Failed to store data in bucket 'ONSXE5TJMNSS24TFM5UXG5DSPFPWG33NFZXXO3TDNRXXKZBOMFYGSLTVONSXE4ZPGQYTKZTDGIYTMLJYMFSDQLJUME3GCLJZMUZGCLJXMY2GEOLCHFTDMMBQHE======': nats: timeout","time":"2024-07-19T06:45:44Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:33","message":"registration error for external service com.owncloud.api.users"}
Jul 19 06:45:46 ubuntu-4gb-nbg1-1 docker[1367715]: {"level":"error","service":"storage-users","error":"Failed to store data in bucket 'ONSXE5TJMNSS24TFM5UXG5DSPFPWG33NFZXXO3TDNRXXKZBOMFYGSLTTORXXEYLHMUWXK43FOJZS6ZLBMYYTIY3GGIWTONRUGQWTINZUGIWTSODEGEWTMZBWHEZGEMRZGFSTGZQ=': nats: timeout","time":"2024-07-19T06:45:44Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:33","message":"registration error for external service com.owncloud.api.storage-users"}
Jul 19 06:45:47 ubuntu-4gb-nbg1-1 docker[1367715]: {"level":"error","service":"ocm","error":"Failed to store data in bucket 'ONSXE5TJMNSS24TFM5UXG5DSPFPWG33NFZXXO3TDNRXXKZBOO5SWELTPMNWS6YJUMJSWCMBVMEWTIYLBMYWTIMLEHAWWEMRTHEWWKOLCGU4DOZJQHA4WCNQ=': nats: timeout","time":"2024-07-19T06:45:46Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:33","message":"registration error for external service com.owncloud.web.ocm"}
Jul 19 06:45:49 ubuntu-4gb-nbg1-1 docker[1367715]: {"level":"error","service":"ocm","error":"Failed to store data in bucket 'ONSXE5TJMNSS24TFM5UXG5DSPFPWG33NFZXXO3TDNRXXKZBOMFYGSLTPMNWS6MRUMQ4TCYLCGIWTGZTEHAWTIZDFMIWWEZRYGQWTIOJYGQYWKYTDMQ2GCMA=': nats: timeout","time":"2024-07-19T06:45:48Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:33","message":"registration error for external service com.owncloud.api.ocm"}
Jul 19 06:45:49 ubuntu-4gb-nbg1-1 docker[1367715]: {"level":"error","service":"groups","error":"Failed to store data in bucket 'ONSXE5TJMNSS24TFM5UXG5DSPFPWG33NFZXXO3TDNRXXKZBOMFYGSLTHOJXXK4DTF5RDSNZTGA4DEYZNGI3DCOJNGQ4TEZBNMIZWIZJNHFSGEOJQMVQWIYJTMY3Q====': nats: timeout","time":"2024-07-19T06:45:47Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:33","message":"registration error for external service com.owncloud.api.groups"}
Jul 19 06:45:53 ubuntu-4gb-nbg1-1 docker[1367715]: {"level":"error","service":"storage-system","error":"Failed to store data in bucket 'ONSXE5TJMNSS24TFM5UXG5DSPFPWG33NFZXXO3TDNRXXKZBOMFYGSLTTORXXEYLHMUWXG6LTORSW2LZTHA3WIMRQGRSC2ODDGYZC2NDBGM4S2OBYGI3C2MZZMEZTEMTGMRQWMOJV': nats: timeout","time":"2024-07-19T06:45:49Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:33","message":"registration error for external service com.owncloud.api.storage-system"}
Jul 19 06:45:54 ubuntu-4gb-nbg1-1 docker[1367715]: {"level":"error","service":"frontend","error":"Failed to store data in bucket 'ONSXE5TJMNSS24TFM5UXG5DSPFPWG33NFZXXO3TDNRXXKZBOO5SWELTGOJXW45DFNZSC6ZJRGVSWGNJZGUWWGMJYMUWTIZLCHAWWCMBUHEWTMZTDMUYDEMJQGE3TGNI=': nats: timeout","time":"2024-07-19T06:45:49Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:33","message":"registration error for external service com.owncloud.web.frontend"}
Jul 19 06:45:59 ubuntu-4gb-nbg1-1 docker[1367715]: {"level":"error","service":"auth-service","error":"Failed to store data in bucket 'ONSXE5TJMNSS24TFM5UXG5DSPFPWG33NFZXXO3TDNRXXKZBOMFYGSLTBOV2GQLLTMVZHM2LDMUXWGYRYHA2WGNDDFU3DKZBUFU2GMYJSFU4DIMDBFU2TSNRQGM2WCZBTMZRWK===': nats: timeout","time":"2024-07-19T06:45:51Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:33","message":"registration error for external service com.owncloud.api.auth-service"}
Jul 19 06:45:59 ubuntu-4gb-nbg1-1 docker[1367715]: {"level":"error","service":"sharing","error":"Failed to store data in bucket 'ONSXE5TJMNSS24TFM5UXG5DSPFPWG33NFZXXO3TDNRXXKZBOMFYGSLTTNBQXE2LOM4XTMMTGHEYTOM3BFU3WINJVFU2DQNDBFU4WCYLDFU4TKMLDGBRGCZBTGU4TK===': nats: timeout","time":"2024-07-19T06:45:51Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:33","message":"registration error for external service com.owncloud.api.sharing"}
Jul 19 06:46:00 ubuntu-4gb-nbg1-1 docker[1367715]: {"level":"error","service":"storage-system","error":"Failed to store data in bucket 'ONSXE5TJMNSS24TFM5UXG5DSPFPWG33NFZXXO3TDNRXXKZBOO5SWELTTORXXEYLHMUWXG6LTORSW2LZTGI4GIMLGMZQS2ZJYGZQS2NDBMUZC2OJVGJTC2MZUMVSTENRXMY3TOMDF': nats: timeout","time":"2024-07-19T06:45:57Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:33","message":"registration error for external service com.owncloud.web.storage-system"}
Jul 19 06:46:01 ubuntu-4gb-nbg1-1 docker[1367715]: {"level":"error","service":"storage-publiclink","error":"Failed to store data in bucket 'ONSXE5TJMNSS24TFM5UXG5DSPFPWG33NFZXXO3TDNRXXKZBOMFYGSLTTORXXEYLHMUWXA5LCNRUWG3DJNZVS6ZDEMVTDQNZYGMWTQNJTGAWTIMLCMEWTQZBSHEWWMMTEG5QWCY3FGQ4DSZI=': nats: timeout","time":"2024-07-19T06:45:56Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:33","message":"registration error for external service com.owncloud.api.storage-publiclink"}
Jul 19 06:45:08 ubuntu-4gb-nbg1-1 docker[1371430]: {"level":"error","service":"collaboration","error":"Failed to store data in bucket 'ONSXE5TJMNSS24TFM5UXG5DSPFPWG33NFZXXO3TDNRXXKZBOMFYGSLTDN5WGYYLCN5ZGC5DJN5XC43LJMNZG643PMZ2DGNRVF43DENBQMQ3WGNRNMVQWEZRNGQ3DSMZNHFRTKNJNGVRWCMBRMJQWIZLCGE2Q====': nats: timeout","time":"2024-07-19T06:45:08Z","line":"github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:33","message":"registration error for external service com.owncloud.api.collaboration.microsoft365"}

Setup

Please describe how you started the server and provide a list of relevant environment variables or configuration files.

```console OCIS_XXX=somevalue OCIS_YYY=somevalue PROXY_XXX=somevalue ```

Additional context

Add any other context about the problem here.

micbar commented 1 month ago

Escalating to P1 because we need to deliver rolling releases which are usable for Load Testing next week. It also blocks bugfixing on the collaboration service which is also needed next week.

jvillafanez commented 1 month ago

goroutine.txt

There is an abnormal number of goroutines being created even on an idle system. Out of the 1241 goroutines, 711 were waiting in a watch function of nats.io (if I read the data correctly). This number is increasing over time even on an idle system. The rest of the 530 goroutines are stable (there are a lot of services running so I'm not sure if that's a normal number for us)

kobergj commented 1 month ago

Yes. We had problems with the ttl of the documents in the registry. That lead to nats being unusable. Fixed with https://github.com/owncloud/ocis/pull/9654