owncloud / ocis

:atom_symbol: ownCloud Infinite Scale Stack
https://doc.owncloud.com/ocis/next/
Apache License 2.0
1.37k stars 180 forks source link

jsoncs3 share manager problems #6497

Closed butonic closed 1 year ago

butonic commented 1 year ago

To force sequential execution of changes, the jsoncs3 share manager makes an m.Lock() call in every called function. This forces requests to be executed sequentially, to a point when the number of requests becomes a bottleneck. In essence, even read requests cannot be executed concurrently.

This can be reproduced by increasing the number of requests per second that ListReceivedShares and then create shares with the same group of users. The List requests wil Lock the jsoncs3 share manager very often. The new shares have to update the group received shares list and cause the Lock to remain open for a comparatively long time, decreasing the number of requests that can be handled per second significantly, causing response time to increase until timeouts occur.

Sharing with the same group causes the bottleneck to manifestate earlier, because the request has to write the received shares list to disk. To do that, it optimistically tries to write a file with an If-Match etag header. And we currently only retry that once.

We have a test cas in cdperf that covers this scenario. You will have to create en env var like this:

export SEED_USERS_CREATE=false
export SEED_USERS_DELETE=false
export SEED_GROUPS_CREATE=false
export SEED_GROUPS_DELETE=false

export ADMIN_LOGIN=admin
export ADMIN_PASSWORD=admin

export POOL_GROUPS=/home/jfd/loadtest/cloud.ocis.test/pools/group.pool.json
export POOL_USERS=/home/jfd/loadtest/cloud.ocis.test/pools/user.pool.json

export PLATFORM_BASE_URL=https://cloud.ocis.test
export AUTH_N_PROVIDER=kopano
export AUTH_N_PROVIDER_KOPANO_BASE_URL=https://cloud.ocis.test
export AUTH_N_PROVIDER_KOPANO_REDIRECT_URL=https://cloud.ocis.test/oidc-callback.html

export TEST_KOKO_PLATFORM_090_RAMPING_SLEEP_AFTER_ITERATION=60s
export TEST_KOKO_PLATFORM_090_RAMPING_STAGES_VUS=50
export TEST_KOKO_PLATFORM_090_RAMPING_STAGES_UP_DURATION=2m
export TEST_KOKO_PLATFORM_090_RAMPING_STAGES_PEAK_DURATION=3m
export TEST_KOKO_PLATFORM_090_RAMPING_STAGES_DOWN_DURATION=1m

Disabling the SEED_* env vers makes k6 read users from pool files:

➜  loadtest tree cloud.ocis.test 
cloud.ocis.test
└── pools
    ├── group.pool.json
    ├── user.pool.json
    └── user.pool.json-bak
# cat cloud.ocis.test/pools/user.pool.json
[
    {
      "userLogin": "admin",
      "userPassword": "admin"
    },
    {
      "userLogin": "marie",
      "userPassword": "radioactivity"
    },
    {
      "userLogin": "einstein",
      "userPassword": "relativity"
    },
    {
      "userLogin": "richard",
      "userPassword": "superfluidity"
    }
]
# cat cloud.ocis.test/pools/group.pool.json 
[
  {
    "groupName": "users"
  }
]

remember, having a single group as the recipient causes writes to the same file, whic we currently only retry once

Then you can source .envrc and run the k6 sharing scenario like this:

k6 run cdperf-rsync/packages/k6-tests/artifacts/koko-platform-090-create-remove-group-share-simple-k6.js --vus=4 --iterations=10

This will use 4 users to execute ten iterations of this sharing scenario.

k6 run cdperf-rsync/packages/k6-tests/artifacts/koko-platform-090-create-remove-group-share-simple-k6.js --vus=4 --iterations=10

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: cdperf-rsync/packages/k6-tests/artifacts/koko-platform-090-create-remove-group-share-simple-k6.js
     output: -

  scenarios: (100.00%) 1 scenario, 4 max VUs, 10m30s max duration (incl. graceful stop):
           * default: 10 iterations shared among 4 VUs (maxDuration: 10m0s, gracefulStop: 30s)

INFO[0005] createShare start - user "einstein"           source=console
INFO[0005] createShare start - user "richard"            source=console
INFO[0005] createShare start - user "admin"              source=console
INFO[0005] createShare start - user "marie"              source=console
INFO[0006] createShare execution time - user "marie": 1022 ms  source=console
INFO[0006] createShare execution time - user "admin": 1106 ms  source=console
INFO[0006] createShare execution time - user "richard": 1211 ms  source=console
INFO[0006] deleteShare start - user "marie"              source=console
INFO[0006] createShare execution time - user "einstein": 1300 ms  source=console
INFO[0006] deleteShare start - user "admin"              source=console
INFO[0006] deleteShare start - user "richard"            source=console
INFO[0006] deleteShare start - user "einstein"           source=console
INFO[0007] deleteShare execution time - user "marie": 801 ms  source=console
INFO[0007] deleteShare execution time - user "admin": 866 ms  source=console
INFO[0007] deleteShare execution time - user "richard": 816 ms  source=console
INFO[0007] deleteShare execution time - user "einstein": 748 ms  source=console
INFO[0008] createShare start - user "marie"              source=console
INFO[0009] createShare execution time - user "marie": 172 ms  source=console
INFO[0009] createShare start - user "richard"            source=console
INFO[0009] createShare start - user "admin"              source=console
INFO[0009] createShare start - user "einstein"           source=console
INFO[0009] deleteShare start - user "marie"              source=console
INFO[0009] createShare execution time - user "admin": 283 ms  source=console
INFO[0009] createShare execution time - user "richard": 370 ms  source=console
INFO[0009] createShare execution time - user "einstein": 403 ms  source=console
INFO[0009] deleteShare start - user "admin"              source=console
INFO[0009] deleteShare start - user "richard"            source=console
INFO[0009] deleteShare start - user "einstein"           source=console
INFO[0009] deleteShare execution time - user "marie": 577 ms  source=console
INFO[0010] deleteShare execution time - user "admin": 652 ms  source=console
INFO[0010] deleteShare execution time - user "richard": 677 ms  source=console
INFO[0010] deleteShare execution time - user "einstein": 707 ms  source=console
INFO[0011] createShare start - user "marie"              source=console
INFO[0012] createShare start - user "admin"              source=console
INFO[0012] createShare execution time - user "marie": 286 ms  source=console
INFO[0012] createShare execution time - user "admin": 246 ms  source=console
INFO[0012] deleteShare start - user "marie"              source=console
INFO[0012] deleteShare start - user "admin"              source=console
INFO[0012] deleteShare execution time - user "marie": 461 ms  source=console
INFO[0013] deleteShare execution time - user "admin": 491 ms  source=console

     ✓ authn -> logonResponse - status
     ✓ authn -> authorizeResponse - status
     ✓ authn -> accessTokenResponse - status
     ✓ client -> role.getMyDrives - status
     ✓ client -> resource.createResource - status
     ✓ client -> resource.getResourceProperties - status
     ✓ client -> share.createShare - status
     ✓ client -> share.deleteShare - status
     ✓ client -> resource.deleteResource - status

     checks.........................: 100.00% ✓ 66       ✗ 0  
     data_received..................: 63 kB   5.4 kB/s
     data_sent......................: 19 kB   1.6 kB/s
     http_req_blocked...............: avg=323.93µs min=232ns   med=472ns    max=6.7ms    p(90)=1.35µs   p(95)=2.97ms  
     http_req_connecting............: avg=7.1µs    min=0s      med=0s       max=148.01µs p(90)=0s       p(95)=70.77µs 
     http_req_duration..............: avg=494.52ms min=50.19ms med=411.27ms max=1.29s    p(90)=1.05s    p(95)=1.12s   
       { expected_response:true }...: avg=494.52ms min=50.19ms med=411.27ms max=1.29s    p(90)=1.05s    p(95)=1.12s   
     http_req_failed................: 0.00%   ✓ 0        ✗ 66 
     http_req_receiving.............: avg=90.57µs  min=40.08µs med=68.92µs  max=482.68µs p(90)=123.51µs p(95)=167.52µs
     http_req_sending...............: avg=140.14µs min=47.59µs med=97.04µs  max=945.15µs p(90)=245.46µs p(95)=305.74µs
     http_req_tls_handshaking.......: avg=302.05µs min=0s      med=0s       max=6.22ms   p(90)=0s       p(95)=2.8ms   
     http_req_waiting...............: avg=494.29ms min=49.69ms med=411.07ms max=1.29s    p(90)=1.05s    p(95)=1.12s   
     http_reqs......................: 66      5.658033/s
     iteration_duration.............: avg=4.19s    min=2.4s    med=3.24s    max=6.15s    p(90)=6.1s     p(95)=6.12s   
     iterations.....................: 10      0.857278/s
     vus............................: 2       min=2      max=4
     vus_max........................: 4       min=4      max=4

running (00m11.7s), 0/4 VUs, 10 complete and 0 interrupted iterations
default ✓ [======================================] 4 VUs  00m11.7s/10m0s  10/10 shared iters

Just disabling the locks causes failures:

➜  loadtest k6 run cdperf-rsync/packages/k6-tests/artifacts/koko-platform-090-create-remove-group-share-simple-k6.js --vus=4 --iterations=10

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: cdperf-rsync/packages/k6-tests/artifacts/koko-platform-090-create-remove-group-share-simple-k6.js
     output: -

  scenarios: (100.00%) 1 scenario, 4 max VUs, 10m30s max duration (incl. graceful stop):
           * default: 10 iterations shared among 4 VUs (maxDuration: 10m0s, gracefulStop: 30s)

INFO[0005] createShare start - user "marie"              source=console
INFO[0005] createShare start - user "admin"              source=console
INFO[0005] createShare start - user "einstein"           source=console
INFO[0005] createShare start - user "richard"            source=console
INFO[0005] createShare execution time - user "marie": 649 ms  source=console
INFO[0005] deleteShare start - user "marie"              source=console
INFO[0006] createShare execution time - user "richard": 987 ms  source=console
INFO[0006] createShare execution time - user "admin": 1050 ms  source=console
INFO[0006] createShare execution time - user "einstein": 1142 ms  source=console
INFO[0006] deleteShare start - user "richard"            source=console
INFO[0006] deleteShare start - user "admin"              source=console
INFO[0006] deleteShare start - user "einstein"           source=console
INFO[0006] deleteShare execution time - user "marie": 806 ms  source=console
INFO[0006] deleteShare execution time - user "richard": 661 ms  source=console
INFO[0006] deleteShare execution time - user "admin": 615 ms  source=console
INFO[0006] deleteShare execution time - user "einstein": 530 ms  source=console
INFO[0008] createShare start - user "marie"              source=console
INFO[0008] createShare start - user "richard"            source=console
INFO[0008] createShare start - user "einstein"           source=console
INFO[0008] createShare start - user "admin"              source=console
INFO[0008] createShare execution time - user "marie": 232 ms  source=console
INFO[0008] deleteShare start - user "marie"              source=console
INFO[0008] createShare execution time - user "admin": 357 ms  source=console
INFO[0008] createShare execution time - user "richard": 373 ms  source=console
INFO[0008] createShare execution time - user "einstein": 367 ms  source=console
INFO[0009] deleteShare start - user "admin"              source=console
INFO[0009] deleteShare start - user "richard"            source=console
INFO[0009] deleteShare start - user "einstein"           source=console
INFO[0009] deleteShare execution time - user "marie": 416 ms  source=console
INFO[0009] deleteShare execution time - user "admin": 566 ms  source=console
INFO[0009] deleteShare execution time - user "richard": 592 ms  source=console
INFO[0009] deleteShare execution time - user "einstein": 648 ms  source=console
INFO[0011] createShare start - user "marie"              source=console
INFO[0011] createShare execution time - user "marie": 186 ms  source=console
INFO[0011] createShare start - user "admin"              source=console
INFO[0011] deleteShare start - user "marie"              source=console
INFO[0011] createShare execution time - user "admin": 209 ms  source=console
INFO[0011] deleteShare start - user "admin"              source=console
INFO[0011] deleteShare execution time - user "marie": 385 ms  source=console
INFO[0012] deleteShare execution time - user "admin": 445 ms  source=console

     ✓ authn -> logonResponse - status
     ✓ authn -> authorizeResponse - status
     ✓ authn -> accessTokenResponse - status
     ✓ client -> role.getMyDrives - status
     ✓ client -> resource.createResource - status
     ✓ client -> resource.getResourceProperties - status
     ✗ client -> share.createShare - status
      ↳  80% — ✓ 8 / ✗ 2
     ✗ client -> share.deleteShare - status
      ↳  80% — ✓ 8 / ✗ 2
     ✓ client -> resource.deleteResource - status

     checks.........................: 93.93% ✓ 62       ✗ 4  
     data_received..................: 59 kB  5.3 kB/s
     data_sent......................: 19 kB  1.7 kB/s
     http_req_blocked...............: avg=301.43µs min=225ns   med=373ns    max=5.98ms   p(90)=984ns    p(95)=2.69ms  
     http_req_connecting............: avg=6.75µs   min=0s      med=0s       max=152.07µs p(90)=0s       p(95)=70.79µs 
     http_req_duration..............: avg=451.57ms min=42.57ms med=387.38ms max=1.14s    p(90)=896.01ms p(95)=1.08s   
       { expected_response:true }...: avg=442.4ms  min=42.57ms med=384.43ms max=1.14s    p(90)=961.74ms p(95)=1.08s   
     http_req_failed................: 6.06%  ✓ 4        ✗ 62 
     http_req_receiving.............: avg=90.12µs  min=37.46µs med=69.51µs  max=552.71µs p(90)=131.68µs p(95)=177.54µs
     http_req_sending...............: avg=111.27µs min=52.24µs med=86.59µs  max=424.97µs p(90)=182.98µs p(95)=233.95µs
     http_req_tls_handshaking.......: avg=284.57µs min=0s      med=0s       max=5.77ms   p(90)=0s       p(95)=2.52ms  
     http_req_waiting...............: avg=451.37ms min=42.02ms med=387.24ms max=1.14s    p(90)=895.8ms  p(95)=1.08s   
     http_reqs......................: 66     6.016612/s
     iteration_duration.............: avg=3.9s     min=2.34s   med=3.02s    max=5.69s    p(90)=5.68s    p(95)=5.69s   
     iterations.....................: 10     0.911608/s
     vus............................: 1      min=1      max=4
     vus_max........................: 4      min=4      max=4

running (00m11.0s), 0/4 VUs, 10 complete and 0 interrupted iterations
default ✓ [======================================] 4 VUs  00m11.0s/10m0s  10/10 shared iters
aduffeck commented 1 year ago

I suppose this can be closed with https://github.com/cs3org/reva/pull/3964 having landed in master via https://github.com/owncloud/ocis/pull/6507.