Open butonic opened 9 months ago
nats allowDirect: true
k6 run packages/k6-tests/artifacts/koko-platform-000-mixed-ramping-k6.js
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: packages/k6-tests/artifacts/koko-platform-000-mixed-ramping-k6.js
output: -
scenarios: (100.00%) 8 scenarios, 75 max VUs, 6m30s max duration (incl. graceful stop):
* add_remove_tag_100: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: add_remove_tag_100, gracefulStop: 30s)
* create_remove_group_share_090: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: create_remove_group_share_090, gracefulStop: 30s)
* create_space_080: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: create_space_080, gracefulStop: 30s)
* create_upload_rename_delete_folder_and_file_040: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: create_upload_rename_delete_folder_and_file_040, gracefulStop: 30s)
* download_050: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: download_050, gracefulStop: 30s)
* navigate_file_tree_020: Up to 10 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: navigate_file_tree_020, gracefulStop: 30s)
* sync_client_110: Up to 20 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: sync_client_110, gracefulStop: 30s)
* user_group_search_070: Up to 20 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: user_group_search_070, gracefulStop: 30s)
✓ authn -> logonResponse - status
✓ authn -> authorizeResponse - status
✓ authn -> accessTokenResponse - status
✗ client -> search.searchForSharees - status
↳ 98% — ✓ 396 / ✗ 6
✓ client -> role.getMyDrives - status
✗ client -> resource.getResourceProperties - status
↳ 99% — ✓ 3384 / ✗ 11
✗ client -> application.createDrive - status
↳ 95% — ✓ 84 / ✗ 4
✗ client -> resource.createResource - status
↳ 98% — ✓ 90 / ✗ 1
✗ client -> drive.deactivateDrive - status
↳ 95% — ✓ 84 / ✗ 4
✓ client -> tag.getTags - status -- (SKIPPED)
✓ client -> tag.createTag - status -- (SKIPPED)
✗ client -> drive.deleteDrive - status
↳ 95% — ✓ 84 / ✗ 4
✓ client -> tag.addTagToResource - status
✗ client -> share.createShare - status
↳ 84% — ✓ 38 / ✗ 7
✓ client -> tag.removeTagToResource - status
✗ client -> share.deleteShare - status
↳ 82% — ✓ 37 / ✗ 8
✗ client -> resource.deleteResource - status
↳ 98% — ✓ 90 / ✗ 1
✗ client -> resource.downloadResource - status
↳ 96% — ✓ 89 / ✗ 3
✓ client -> resource.uploadResource - status
✓ client -> resource.moveResource - status
checks.........................: 99.03% ✓ 5030 ✗ 49
data_received..................: 1.5 GB 3.8 MB/s
data_sent......................: 832 MB 2.1 MB/s
http_req_blocked...............: avg=221.9µs min=0s med=0s max=83.35ms p(90)=1µs p(95)=1µs
http_req_connecting............: avg=17.75µs min=0s med=0s max=5.19ms p(90)=0s p(95)=0s
http_req_duration..............: avg=152.33ms min=13.29ms med=46.77ms max=10.02s p(90)=163.51ms p(95)=286.28ms
{ expected_response:true }...: avg=108.78ms min=13.29ms med=46.49ms max=8.95s p(90)=158.11ms p(95)=262.55ms
http_req_failed................: 0.98% ✓ 49 ✗ 4936
http_req_receiving.............: avg=20.74ms min=14µs med=55µs max=8.87s p(90)=96µs p(95)=157µs
http_req_sending...............: avg=5.67ms min=20µs med=67µs max=5.16s p(90)=108µs p(95)=148µs
http_req_tls_handshaking.......: avg=190.59µs min=0s med=0s max=81.33ms p(90)=0s p(95)=0s
http_req_waiting...............: avg=125.92ms min=13.18ms med=46.54ms max=10.02s p(90)=157.26ms p(95)=265.42ms
http_reqs......................: 4985 12.783076/s
iteration_duration.............: avg=5.9s min=1.01s med=1.05s max=44.25s p(90)=15.24s p(95)=17.28s
iterations.....................: 3900 10.000802/s
vus............................: 2 min=0 max=75
vus_max........................: 75 min=33 max=75
running (6m30.0s), 00/75 VUs, 3900 complete and 3 interrupted iterations
add_remove_tag_100 ✓ [======================================] 0/5 VUs 6m0s
create_remove_group_share_090 ✓ [======================================] 1/5 VUs 6m0s
create_space_080 ✓ [======================================] 0/5 VUs 6m0s
create_upload_rename_delete... ✓ [======================================] 0/5 VUs 6m0s
download_050 ✓ [======================================] 0/5 VUs 6m0s
navigate_file_tree_020 ✓ [======================================] 00/10 VUs 6m0s
sync_client_110 ✓ [======================================] 00/20 VUs 6m0s
user_group_search_070 ✓ [======================================] 00/20 VUs 6m0s
There are more details on the direct get api: https://github.com/nats-io/nats-architecture-and-design/blob/main/adr/ADR-31.md
Even on a local k3s cluster, setting allowDirect: false
makes navigating in the web ui unresponsive. AFAICT decomposedfs will make a lot of read requests and every read will be forwarded to the leader (internally by NATS). While this gives us read-after-write consistency we cannot use it:
# time PROPFIND with curl
$ time curl -X PROPFIND https://ocis.kube.owncloud.test/dav/spaces/5ce828a7-7f85-4e5d-954f-12ee525f95e8%2438fbe2b0-2420-4e06-b1d0-77871811d13e -H 'Authorization: Bearer ...' -k
0,00s user 0,01s system 10% cpu 0,086 total
# disable allow-direct
$ nats stream edit KV_storage-users --no-allow-direct
# retime PROPFIND request
$ time curl -X PROPFIND https://ocis.kube.owncloud.test/dav/spaces/5ce828a7-7f85-4e5d-954f-12ee525f95e8%2438fbe2b0-2420-4e06-b1d0-77871811d13e -H 'Authorization: Bearer ...' -k
0,00s user 0,01s system 0% cpu 20,071 total
$ nats stream edit KV_storage-users --allow-direct
$ time curl -X PROPFIND https://ocis.kube.owncloud.test/dav/spaces/5ce828a7-7f85-4e5d-954f-12ee525f95e8%2438fbe2b0-2420-4e06-b1d0-77871811d13e -H 'Authorization: Bearer ...' -k
0,01s user 0,00s system 16% cpu 0,057 total
$ nats stream edit KV_storage-users --no-allow-direct
$ time curl -X PROPFIND https://ocis.kube.owncloud.test/dav/spaces/5ce828a7-7f85-4e5d-954f-12ee525f95e8%2438fbe2b0-2420-4e06-b1d0-77871811d13e -H 'Authorization: Bearer ...' -k
0,00s user 0,01s system 0% cpu 20,062 total
0,00s user 0,00s system 0% cpu 20,100 total
0,01s user 0,00s system 0% cpu 20,065 total
why 20 sec???
When using the nats cli from the natsbox pod getting a key is fast:
# nats stream edit KV_storage-users --no-allow-direct
# time nats --trace kv get storage-users ONYGCY3FOMXTGOBPMZRGKMTCGAWTENBSGAWTIZJQGYWWEMLEGAWTONZYG4YTQMJRMQYTGZJPNZXWIZLTF42WGLZUMQXTINJPMMYC6LLDMRQTCLJUGA
ZTMLLBHBSDILJZGA2DINTFGEYTCZRWGY======
12:23:45 >>> $JS.API.STREAM.INFO.KV_storage-users
12:23:45 <<< $JS.API.STREAM.INFO.KV_storage-users: {"type":"io.nats.jetstream.api.v1.stream_info_response","total":0,"offset":0,"limit":0,"config":{"name":"KV_storage-users","description":"oCIS storageusers cache","subjects":["$KV.storage-users.\u003e"],"retention":"limits","max_consumers":-1,"max_msgs":-1,"max_bytes":-1,"max_age":86400000000000,"max_msgs_per_subject":1,"max_msg_size":-1,"discard":"new","storage":"memory","num_replicas":3,"duplicate_window":120000000000,"compression":"none","allow_direct":false,"mirror_direct":false,"sealed":false,"deny_delete":true,"deny_purge":false,"allow_rollup_hdrs":true,"consumer_limits":{}},"created":"2024-02-12T15:39:35.991939358Z","state":{"messages":6,"bytes":4430,"first_seq":300,"first_ts":"2024-02-13T12:10:07.977971137Z","last_seq":305,"last_ts":"2024-02-13T12:10:12.980558955Z","num_subjects":6,"consumer_count":0},"cluster":{"name":"nats","raft_group":"S-R3M-3sO44a5m","leader":"nats-0","replicas":[{"name":"nats-1","current":true,"active":779305533,"peer":"yrzKKRBu"},{"name":"nats-2","current":true,"active":779288518,"peer":"cnrtt3eg"}]},"ts":"2024-02-13T12:23:45.590719247Z"}
12:23:45 >>> $JS.API.STREAM.MSG.GET.KV_storage-users
{"last_by_subj":"$KV.storage-users.ONYGCY3FOMXTGOBPMZRGKMTCGAWTENBSGAWTIZJQGYWWEMLEGAWTONZYG4YTQMJRMQYTGZJPNZXWIZLTF42WGLZUMQXTINJPMMYC6LLDMRQTCLJUGAZTMLLBHBSDILJZGA2DINTFGEYTCZRWGY======"}
12:23:45 <<< $JS.API.STREAM.MSG.GET.KV_storage-users: {"type":"io.nats.jetstream.api.v1.stream_msg_get_response","message":{"subject":"$KV.storage-users.ONYGCY3FOMXTGOBPMZRGKMTCGAWTENBSGAWTIZJQGYWWEMLEGAWTONZYG4YTQMJRMQYTGZJPNZXWIZLTF42WGLZUMQXTINJPMMYC6LLDMRQTCLJUGAZTMLLBHBSDILJZGA2DINTFGEYTCZRWGY======","seq":304,"data":"eyJrZXkiOiJzcGFjZXMvMzgvZmJlMmIwLTI0MjAtNGUwNi1iMWQwLTc3ODcxODExZDEzZS9ub2Rlcy81Yy80ZC80NS9jMC8tY2RhMS00MDM2LWE4ZDQtOTA0NDZlMTExZjY2IiwiZGF0YSI6ImhiSjFjMlZ5TG05amFYTXVjR0Z5Wlc1MGFXVEVKRE00Wm1KbE1tSXdMVEkwTWpBdE5HVXdOaTFpTVdRd0xUYzNPRGN4T0RFeFpERXpaYTUxYzJWeUxtOWphWE11Ym1GdFpjUUtUbVYzSUdadmJHUmxjckoxYzJWeUxtOWphWE11ZEhKbFpYTnBlbVhFQVRDMWRYTmxjaTV2WTJsekxuQnliM0JoWjJGMGFXOXV4QUV4cm5WelpYSXViMk5wY3k1MGVYQmx4QUV5IiwibWV0YWRhdGEiOm51bGx9","time":"2024-02-13T12:10:12.980555851Z"}}
storage-users > ONYGCY3FOMXTGOBPMZRGKMTCGAWTENBSGAWTIZJQGYWWEMLEGAWTONZYG4YTQMJRMQYTGZJPNZXWIZLTF42WGLZUMQXTINJPMMYC6LLDMRQTCLJUGAZTMLLBHBSDILJZGA2DINTFGEYTCZRWGY====== revision: 304 created @ 13 Feb 24 12:10 UTC
Showing first 120 bytes of 327, use --raw for full data
{"key":"spaces/38/fbe2b0-2420-4e06-b1d0-77871811d13e/nodes/5c/4d/45/c0/-cda1-4036-a8d4-90446e111f66","data":"hbJ1c2VyLm9
real 0m 0.02s
user 0m 0.01s
sys 0m 0.00s
# nats stream edit KV_storage-users --allow-direct
# time nats --trace kv get storage-users ONYGCY3FOMXTGOBPMZRGKMTCGAWTENBSGAWTIZJQGYWWEMLEGAWTONZYG4YTQMJRMQYTGZJPNZXWIZLTF42WGLZUMQXTINJPMMYC6LLDMRQTCLJUGA
ZTMLLBHBSDILJZGA2DINTFGEYTCZRWGY======
12:24:18 >>> $JS.API.STREAM.INFO.KV_storage-users
12:24:18 <<< $JS.API.STREAM.INFO.KV_storage-users: {"type":"io.nats.jetstream.api.v1.stream_info_response","total":0,"offset":0,"limit":0,"config":{"name":"KV_storage-users","description":"oCIS storageusers cache","subjects":["$KV.storage-users.\u003e"],"retention":"limits","max_consumers":-1,"max_msgs":-1,"max_bytes":-1,"max_age":86400000000000,"max_msgs_per_subject":1,"max_msg_size":-1,"discard":"new","storage":"memory","num_replicas":3,"duplicate_window":120000000000,"compression":"none","allow_direct":true,"mirror_direct":false,"sealed":false,"deny_delete":true,"deny_purge":false,"allow_rollup_hdrs":true,"consumer_limits":{}},"created":"2024-02-12T15:39:35.991939358Z","state":{"messages":6,"bytes":4430,"first_seq":300,"first_ts":"2024-02-13T12:10:07.977971137Z","last_seq":305,"last_ts":"2024-02-13T12:10:12.980558955Z","num_subjects":6,"consumer_count":0},"cluster":{"name":"nats","raft_group":"S-R3M-3sO44a5m","leader":"nats-0","replicas":[{"name":"nats-1","current":true,"active":420380369,"peer":"yrzKKRBu"},{"name":"nats-2","current":true,"active":420401851,"peer":"cnrtt3eg"}]},"ts":"2024-02-13T12:24:18.232370651Z"}
12:24:18 >>> $JS.API.DIRECT.GET.KV_storage-users.$KV.storage-users.ONYGCY3FOMXTGOBPMZRGKMTCGAWTENBSGAWTIZJQGYWWEMLEGAWTONZYG4YTQMJRMQYTGZJPNZXWIZLTF42WGLZUMQXTINJPMMYC6LLDMRQTCLJUGAZTMLLBHBSDILJZGA2DINTFGEYTCZRWGY======
12:24:18 <<< $JS.API.DIRECT.GET.KV_storage-users.$KV.storage-users.ONYGCY3FOMXTGOBPMZRGKMTCGAWTENBSGAWTIZJQGYWWEMLEGAWTONZYG4YTQMJRMQYTGZJPNZXWIZLTF42WGLZUMQXTINJPMMYC6LLDMRQTCLJUGAZTMLLBHBSDILJZGA2DINTFGEYTCZRWGY======: {"key":"spaces/38/fbe2b0-2420-4e06-b1d0-77871811d13e/nodes/5c/4d/45/c0/-cda1-4036-a8d4-90446e111f66","data":"hbJ1c2VyLm9jaXMucGFyZW50aWTEJDM4ZmJlMmIwLTI0MjAtNGUwNi1iMWQwLTc3ODcxODExZDEzZa51c2VyLm9jaXMubmFtZcQKTmV3IGZvbGRlcrJ1c2VyLm9jaXMudHJlZXNpemXEATC1dXNlci5vY2lzLnByb3BhZ2F0aW9uxAExrnVzZXIub2Npcy50eXBlxAEy","metadata":null}
storage-users > ONYGCY3FOMXTGOBPMZRGKMTCGAWTENBSGAWTIZJQGYWWEMLEGAWTONZYG4YTQMJRMQYTGZJPNZXWIZLTF42WGLZUMQXTINJPMMYC6LLDMRQTCLJUGAZTMLLBHBSDILJZGA2DINTFGEYTCZRWGY====== revision: 304 created @ 13 Feb 24 12:10 UTC
Showing first 120 bytes of 327, use --raw for full data
{"key":"spaces/38/fbe2b0-2420-4e06-b1d0-77871811d13e/nodes/5c/4d/45/c0/-cda1-4036-a8d4-90446e111f66","data":"hbJ1c2VyLm9
real 0m 0.01s
user 0m 0.01s
sys 0m 0.00s
Can be reproducey locally:
# go/bin/nats -s 127.0.0.1:9233 stream edit KV_storage-users --no-allow-direct
# time curl -X PROPFIND https://cloud.ocis.test/remote.php/dav/spaces/storage-users-1%24some-admin-user-id-0000-000000000000 -u admin:admin -k
0,00s user 0,00s system 0% cpu 30,189 total
When running locally you have to restart ocis to reinitialize the nats connection. I think we are running into some timeout.
Looking at the code I can see that the nats-js-kv store does not allow configuring any JetStream options. In fact, it does not even pass the nats.DirectGet()
option which would be needed to enable direct gets when allow_direct
is set to true. So ... AFAICT ocis never made use of direct get. We always talk to the latest node and should have read-after-write consistency. There shoul be no difference in the benchmarks with or without But allow direct is enabled by default when creating a new kv store, see below.allow_direct=true
.
nats allowDirect: false
k6 run packages/k6-tests/artifacts/koko-platform-000-mixed-ramping-k6.js
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: packages/k6-tests/artifacts/koko-platform-000-mixed-ramping-k6.js
output: -
scenarios: (100.00%) 8 scenarios, 75 max VUs, 6m30s max duration (incl. graceful stop):
* add_remove_tag_100: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: add_remove_tag_100, gracefulStop: 30s)
* create_remove_group_share_090: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: create_remove_group_share_090, gracefulStop: 30s)
* create_space_080: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: create_space_080, gracefulStop: 30s)
* create_upload_rename_delete_folder_and_file_040: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: create_upload_rename_delete_folder_and_file_040, gracefulStop: 30s)
* download_050: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: download_050, gracefulStop: 30s)
* navigate_file_tree_020: Up to 10 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: navigate_file_tree_020, gracefulStop: 30s)
* sync_client_110: Up to 20 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: sync_client_110, gracefulStop: 30s)
* user_group_search_070: Up to 20 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: user_group_search_070, gracefulStop: 30s)
✓ authn -> logonResponse - status
✓ authn -> authorizeResponse - status
✓ authn -> accessTokenResponse - status
✗ client -> search.searchForSharees - status
↳ 99% — ✓ 389 / ✗ 3
✓ client -> role.getMyDrives - status
✗ client -> resource.getResourceProperties - status
↳ 99% — ✓ 2987 / ✗ 10
✗ client -> application.createDrive - status
↳ 98% — ✓ 85 / ✗ 1
✗ client -> resource.createResource - status
↳ 98% — ✓ 89 / ✗ 1
✗ client -> drive.deactivateDrive - status
↳ 98% — ✓ 85 / ✗ 1
✓ client -> tag.getTags - status -- (SKIPPED)
✓ client -> tag.createTag - status -- (SKIPPED)
✗ client -> drive.deleteDrive - status
↳ 98% — ✓ 85 / ✗ 1
✗ client -> tag.addTagToResource - status
↳ 97% — ✓ 44 / ✗ 1
✗ client -> share.createShare - status
↳ 97% — ✓ 44 / ✗ 1
✗ client -> tag.removeTagToResource - status
↳ 97% — ✓ 44 / ✗ 1
✗ client -> share.deleteShare - status
↳ 97% — ✓ 44 / ✗ 1
✗ client -> resource.deleteResource - status
↳ 98% — ✓ 89 / ✗ 1
✓ client -> resource.downloadResource - status
✓ client -> resource.uploadResource - status
✓ client -> resource.moveResource - status
checks.........................: 99.52% ✓ 4629 ✗ 22
data_received..................: 1.5 GB 4.0 MB/s
data_sent......................: 831 MB 2.2 MB/s
http_req_blocked...............: avg=449.75µs min=0s med=0s max=225.02ms p(90)=1µs p(95)=1µs
http_req_connecting............: avg=23.39µs min=0s med=0s max=11.12ms p(90)=0s p(95)=0s
http_req_duration..............: avg=393.36ms min=13.63ms med=81.01ms max=19.56s p(90)=1.02s p(95)=1.87s
{ expected_response:true }...: avg=363.82ms min=13.63ms med=80.26ms max=19.56s p(90)=979.74ms p(95)=1.8s
http_req_failed................: 0.48% ✓ 22 ✗ 4539
http_req_receiving.............: avg=50.83ms min=10µs med=60µs max=17.88s p(90)=149µs p(95)=412µs
http_req_sending...............: avg=10.5ms min=20µs med=75µs max=8.08s p(90)=197µs p(95)=418µs
http_req_tls_handshaking.......: avg=417.02µs min=0s med=0s max=222.34ms p(90)=0s p(95)=0s
http_req_waiting...............: avg=332.03ms min=13.47ms med=79.4ms max=10.65s p(90)=916.08ms p(95)=1.7s
http_reqs......................: 4561 12.066187/s
iteration_duration.............: avg=6.61s min=1.02s med=1.12s max=47.6s p(90)=16.88s p(95)=19.09s
iterations.....................: 3482 9.211678/s
vus............................: 2 min=0 max=75
vus_max........................: 75 min=34 max=75
running (6m18.0s), 00/75 VUs, 3482 complete and 5 interrupted iterations
add_remove_tag_100 ✓ [======================================] 0/5 VUs 6m0s
create_remove_group_share_090 ✓ [======================================] 0/5 VUs 6m0s
create_space_080 ✓ [======================================] 0/5 VUs 6m0s
create_upload_rename_delete... ✓ [======================================] 0/5 VUs 6m0s
download_050 ✓ [======================================] 0/5 VUs 6m0s
navigate_file_tree_020 ✓ [======================================] 00/10 VUs 6m0s
sync_client_110 ✓ [======================================] 00/20 VUs 6m0s
user_group_search_070 ✓ [======================================] 00/20 VUs 6m0s
tl;dr: While the github.com/nats-io/nats.go client requires explicitly passing the DirectGet()
option, the KV store enables it by default. So currently all clients actually use the directget feature, because all streams are configured to allow_direct: true
.
After trying to find a place for a breakpoint in the nats client I have to revise my former comment:
The micro nats-js-kv package uses the nats.io client to initialize a kv store using nats.CreateKeyValue()
. It initializes the underlying stream with AllowDirect: true
:
scfg := &StreamConfig{
Name: fmt.Sprintf(kvBucketNameTmpl, cfg.Bucket),
// [...]
AllowDirect: true,
// [...]
}
The stream is then mapped to the KV in mapStreamToKVS
:
kv := &kvs{
// [...]
useDirect: info.Config.AllowDirect,
}
The initialized kv store is then used when making requests. It will always default to AllowDirect. This also explains why there is a measurable difference in the before p(95)=286.28ms / after p(95)=1.87s comparison. But we need to compare in a cluster. The above were measured in a local signle node kubernetes deployment.
So, we do have a consistency problem that we can try to mitigate by setting allow-direct: false
and restarting the ocis pods. But there is a performance cost that we need to evaluate.
The nats kv client will read the configuration of the underlying kv stream and disable direct get when allow-direct is false. nats-js-kv caches the store configuration, which is why we have to restart the client before it picks up the new stream configuration. The timeout is caused by the client trying to make a DirectGet request which the server denies to respond to. Then the client falls back to a non DirectGet request. The timeouts add up for every get we send to the kv.
The results were generated on the new and dedicated kubernetes infrastructure, 75 users for 6 minutes
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: /root/cdperf/packages/k6-tests/artifacts/koko-platform-000-mixed-ramping-k6.js
output: -
scenarios: (100.00%) 8 scenarios, 75 max VUs, 6m30s max duration (incl. graceful stop):
* add_remove_tag_100: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: add_remove_tag_100, gracefulStop: 30s)
* create_remove_group_share_090: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: create_remove_group_share_090, gracefulStop: 30s)
* create_space_080: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: create_space_080, gracefulStop: 30s)
* create_upload_rename_delete_folder_and_file_040: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: create_upload_rename_delete_folder_and_file_040, gracefulStop: 30s)
* download_050: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: download_050, gracefulStop: 30s)
* navigate_file_tree_020: Up to 10 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: navigate_file_tree_020, gracefulStop: 30s)
* sync_client_110: Up to 20 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: sync_client_110, gracefulStop: 30s)
* user_group_search_070: Up to 20 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: user_group_search_070, gracefulStop: 30s)
✓ authn -> loginPageResponse - status
✓ authn -> authorizationResponse - status
✓ authn -> accessTokenResponse - status
✓ client -> search.searchForSharees - status
✓ client -> role.getMyDrives - status
✓ client -> resource.getResourceProperties - status
✓ client -> application.createDrive - status
✓ client -> resource.createResource - status
✓ client -> drive.deactivateDrive - status
✓ client -> tag.getTags - status -- (SKIPPED)
✓ client -> tag.createTag - status -- (SKIPPED)
✗ client -> drive.deleteDrive - status
↳ 97% — ✓ 85 / ✗ 2
✗ client -> tag.addTagToResource - status
↳ 0% — ✓ 0 / ✗ 47
✗ client -> tag.removeTagToResource - status
↳ 0% — ✓ 0 / ✗ 47
✓ client -> share.createShare - status
✓ client -> share.deleteShare - status
✓ client -> resource.downloadResource - status
✓ client -> resource.deleteResource - status
✓ client -> resource.uploadResource - status
✓ client -> resource.moveResource - status
checks.........................: 97.94% ✓ 4578 ✗ 96
data_received..................: 1.7 GB 4.4 MB/s
data_sent......................: 812 MB 2.1 MB/s
http_req_blocked...............: avg=1.21ms min=190ns med=280ns max=103.38ms p(90)=581ns p(95)=892ns
http_req_connecting............: avg=308.18µs min=0s med=0s max=19.32ms p(90)=0s p(95)=0s
http_req_duration..............: avg=281.5ms min=5.22ms med=209.26ms max=6.99s p(90)=453.73ms p(95)=575.88ms
{ expected_response:true }...: avg=285.41ms min=5.22ms med=210.42ms max=6.99s p(90)=456.33ms p(95)=584.61ms
http_req_failed................: 2.09% ✓ 96 ✗ 4484
http_req_receiving.............: avg=8.3ms min=24.56µs med=80.04µs max=5.83s p(90)=162.77µs p(95)=49.55ms
http_req_sending...............: avg=1.53ms min=39.17µs med=83.2µs max=630.52ms p(90)=124.85µs p(95)=147.85µs
http_req_tls_handshaking.......: avg=881.77µs min=0s med=0s max=85.78ms p(90)=0s p(95)=0s
http_req_waiting...............: avg=271.66ms min=4.92ms med=209.1ms max=6.38s p(90)=435.25ms p(95)=547.58ms
http_reqs......................: 4580 11.743577/s
iteration_duration.............: avg=6.48s min=1.15s med=1.22s max=42.05s p(90)=15.52s p(95)=17.93s
iterations.....................: 3566 9.14358/s
vus............................: 1 min=0 max=75
vus_max........................: 75 min=75 max=75
running (6m30.0s), 00/75 VUs, 3566 complete and 1 interrupted iterations
add_remove_tag_100 ✓ [======================================] 0/5 VUs 6m0s
create_remove_group_share_090 ✓ [======================================] 0/5 VUs 6m0s
create_space_080 ✓ [======================================] 0/5 VUs 6m0s
create_upload_rename_delete... ✓ [======================================] 0/5 VUs 6m0s
download_050 ✓ [======================================] 0/5 VUs 6m0s
navigate_file_tree_020 ✓ [======================================] 00/10 VUs 6m0s
sync_client_110 ✓ [======================================] 00/20 VUs 6m0s
user_group_search_070 ✓ [======================================] 00/20 VUs 6m0s
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: /root/cdperf/packages/k6-tests/artifacts/koko-platform-000-mixed-ramping-k6.js
output: -
scenarios: (100.00%) 8 scenarios, 75 max VUs, 6m30s max duration (incl. graceful stop):
* add_remove_tag_100: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: add_remove_tag_100, gracefulStop: 30s)
* create_remove_group_share_090: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: create_remove_group_share_090, gracefulStop: 30s)
* create_space_080: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: create_space_080, gracefulStop: 30s)
* create_upload_rename_delete_folder_and_file_040: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: create_upload_rename_delete_folder_and_file_040, gracefulStop: 30s)
* download_050: Up to 5 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: download_050, gracefulStop: 30s)
* navigate_file_tree_020: Up to 10 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: navigate_file_tree_020, gracefulStop: 30s)
* sync_client_110: Up to 20 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: sync_client_110, gracefulStop: 30s)
* user_group_search_070: Up to 20 looping VUs for 6m0s over 3 stages (gracefulRampDown: 30s, exec: user_group_search_070, gracefulStop: 30s)
✓ authn -> loginPageResponse - status
✓ authn -> authorizationResponse - status
✓ authn -> accessTokenResponse - status
✓ client -> search.searchForSharees - status
✗ client -> role.getMyDrives - status
↳ 94% — ✓ 47 / ✗ 3
✗ client -> resource.getResourceProperties - status
↳ 90% — ✓ 2657 / ✗ 273
✗ client -> application.createDrive - status
↳ 92% — ✓ 76 / ✗ 6
✗ client -> resource.createResource - status
↳ 86% — ✓ 72 / ✗ 11
✗ client -> drive.deactivateDrive - status
↳ 92% — ✓ 76 / ✗ 6
✓ client -> tag.getTags - status -- (SKIPPED)
✓ client -> tag.createTag - status -- (SKIPPED)
✗ client -> resource.downloadResource - status
↳ 95% — ✓ 91 / ✗ 4
✗ client -> drive.deleteDrive - status
↳ 92% — ✓ 76 / ✗ 6
✗ client -> tag.addTagToResource - status
↳ 0% — ✓ 0 / ✗ 47
✗ client -> tag.removeTagToResource - status
↳ 0% — ✓ 0 / ✗ 47
✗ client -> share.createShare - status
↳ 77% — ✓ 31 / ✗ 9
✗ client -> share.deleteShare - status
↳ 75% — ✓ 30 / ✗ 10
✗ client -> resource.deleteResource - status
↳ 81% — ✓ 68 / ✗ 15
✗ client -> resource.uploadResource - status
↳ 93% — ✓ 40 / ✗ 3
✗ client -> resource.moveResource - status
↳ 90% — ✓ 39 / ✗ 4
checks.........................: 90.17% ✓ 4075 ✗ 444
data_received..................: 1.6 GB 4.1 MB/s
data_sent......................: 812 MB 2.1 MB/s
http_req_blocked...............: avg=1.24ms min=200ns med=281ns max=96.26ms p(90)=591ns p(95)=1µs
http_req_connecting............: avg=315.84µs min=0s med=0s max=20.04ms p(90)=0s p(95)=0s
http_req_duration..............: avg=412.24ms min=4.38ms med=188.25ms max=20.35s p(90)=432.55ms p(95)=562.9ms
{ expected_response:true }...: avg=270.85ms min=4.38ms med=193.82ms max=6.04s p(90)=430.31ms p(95)=545.78ms
http_req_failed................: 10.03% ✓ 444 ✗ 3981
http_req_receiving.............: avg=7.24ms min=24.37µs med=80.16µs max=1.11s p(90)=167.16µs p(95)=49.49ms
http_req_sending...............: avg=1.54ms min=42.36µs med=82.82µs max=608.2ms p(90)=132.09µs p(95)=156.41µs
http_req_tls_handshaking.......: avg=895.75µs min=0s med=0s max=37.7ms p(90)=0s p(95)=0s
http_req_waiting...............: avg=403.46ms min=4.11ms med=188.09ms max=20.35s p(90)=413.79ms p(95)=524.53ms
http_reqs......................: 4425 11.377364/s
iteration_duration.............: avg=6.71s min=1.03s med=1.2s max=58.2s p(90)=15.62s p(95)=18.17s
iterations.....................: 3433 8.826778/s
vus............................: 1 min=0 max=75
vus_max........................: 75 min=75 max=75
running (6m28.9s), 00/75 VUs, 3433 complete and 2 interrupted iterations
add_remove_tag_100 ✓ [======================================] 0/5 VUs 6m0s
create_remove_group_share_090 ✓ [======================================] 0/5 VUs 6m0s
create_space_080 ✓ [======================================] 0/5 VUs 6m0s
create_upload_rename_delete... ✓ [======================================] 0/5 VUs 6m0s
download_050 ✓ [======================================] 0/5 VUs 6m0s
navigate_file_tree_020 ✓ [======================================] 00/10 VUs 6m0s
sync_client_110 ✓ [======================================] 00/20 VUs 6m0s
user_group_search_070 ✓ [======================================] 00/20 VUs 6m0s
next step: 750 users 60 minutes
750 users for 60 minutes
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: /root/cdperf/packages/k6-tests/artifacts/koko-platform-000-mixed-ramping-k6.js
output: -
scenarios: (100.00%) 8 scenarios, 750 max VUs, 1h0m30s max duration (incl. graceful stop):
* add_remove_tag_100: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: add_remove_tag_100, gracefulStop: 30s)
* create_remove_group_share_090: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: create_remove_group_share_090, gracefulStop: 30s)
* create_space_080: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: create_space_080, gracefulStop: 30s)
* create_upload_rename_delete_folder_and_file_040: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: create_upload_rename_delete_folder_and_file_040, gracefulStop: 30s)
* download_050: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: download_050, gracefulStop: 30s)
* navigate_file_tree_020: Up to 100 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: navigate_file_tree_020, gracefulStop: 30s)
* sync_client_110: Up to 200 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: sync_client_110, gracefulStop: 30s)
* user_group_search_070: Up to 200 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: user_group_search_070, gracefulStop: 30s)
✓ authn -> loginPageResponse - status
✓ authn -> authorizationResponse - status
✓ authn -> accessTokenResponse - status
✗ client -> search.searchForSharees - status
↳ 99% — ✓ 3683 / ✗ 6
✓ client -> role.getMyDrives - status
✗ client -> resource.getResourceProperties - status
↳ 99% — ✓ 31739 / ✗ 13
✗ client -> application.createDrive - status
↳ 99% — ✓ 902 / ✗ 1
✗ client -> resource.createResource - status
↳ 99% — ✓ 922 / ✗ 3
✗ client -> drive.deactivateDrive - status
↳ 99% — ✓ 902 / ✗ 1
✓ client -> tag.getTags - status -- (SKIPPED)
✓ client -> tag.createTag - status -- (SKIPPED)
✗ client -> resource.downloadResource - status
↳ 99% — ✓ 916 / ✗ 2
✗ client -> drive.deleteDrive - status
↳ 98% — ✓ 891 / ✗ 12
✗ client -> tag.addTagToResource - status
↳ 0% — ✓ 0 / ✗ 468
✗ client -> tag.removeTagToResource - status
↳ 0% — ✓ 0 / ✗ 468
✗ client -> share.createShare - status
↳ 98% — ✓ 453 / ✗ 9
✗ client -> share.deleteShare - status
↳ 95% — ✓ 440 / ✗ 22
✗ client -> resource.deleteResource - status
↳ 99% — ✓ 922 / ✗ 3
✗ client -> resource.uploadResource - status
↳ 98% — ✓ 457 / ✗ 6
✗ client -> resource.moveResource - status
↳ 99% — ✓ 462 / ✗ 1
checks.........................: 98.11% ✓ 52761 ✗ 1015
data_received..................: 16 GB 4.4 MB/s
data_sent......................: 7.9 GB 2.2 MB/s
http_req_blocked...............: avg=12.11ms min=180ns med=271ns max=5.05s p(90)=45.79ms p(95)=47.2ms
http_req_connecting............: avg=3.8ms min=0s med=0s max=1.04s p(90)=16.58ms p(95)=16.69ms
http_req_duration..............: avg=364.48ms min=4.01ms med=309.8ms max=50.03s p(90)=551.32ms p(95)=776.71ms
{ expected_response:true }...: avg=362.81ms min=4.01ms med=311.61ms max=10.39s p(90)=554.72ms p(95)=780.98ms
http_req_failed................: 1.92% ✓ 1015 ✗ 51825
http_req_receiving.............: avg=5.84ms min=0s med=86.66µs max=1.66s p(90)=156.14µs p(95)=49.1ms
http_req_sending...............: avg=1.6ms min=40.23µs med=100.69µs max=879.73ms p(90)=174.24µs p(95)=204.66µs
http_req_tls_handshaking.......: avg=8.24ms min=0s med=0s max=5.03s p(90)=28.95ms p(95)=30.33ms
http_req_waiting...............: avg=357.03ms min=3.85ms med=306.12ms max=50.03s p(90)=532.58ms p(95)=734.47ms
http_reqs......................: 52840 14.556453/s
iteration_duration.............: avg=54.94s min=10.23s med=10.37s max=5m25s p(90)=2m30s p(95)=2m33s
iterations.....................: 36082 9.939931/s
vus............................: 5 min=0 max=750
vus_max........................: 750 min=64 max=750
running (1h00m30.0s), 000/750 VUs, 36082 complete and 536 interrupted iterations
add_remove_tag_100 ✓ [======================================] 00/50 VUs 1h0m0s
create_remove_group_share_090 ✓ [======================================] 00/50 VUs 1h0m0s
create_space_080 ✓ [======================================] 00/50 VUs 1h0m0s
create_upload_rename_delete... ✓ [======================================] 01/50 VUs 1h0m0s
download_050 ✓ [======================================] 00/50 VUs 1h0m0s
navigate_file_tree_020 ✓ [======================================] 000/100 VUs 1h0m0s
sync_client_110 ✓ [======================================] 000/200 VUs 1h0m0s
user_group_search_070 ✓ [======================================] 000/200 VUs 1h0m0s
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: /root/cdperf/packages/k6-tests/artifacts/koko-platform-000-mixed-ramping-k6.js
output: -
scenarios: (100.00%) 8 scenarios, 750 max VUs, 1h0m30s max duration (incl. graceful stop):
* add_remove_tag_100: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: add_remove_tag_100, gracefulStop: 30s)
* create_remove_group_share_090: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: create_remove_group_share_090, gracefulStop: 30s)
* create_space_080: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: create_space_080, gracefulStop: 30s)
* create_upload_rename_delete_folder_and_file_040: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: create_upload_rename_delete_folder_and_file_040, gracefulStop: 30s)
* download_050: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: download_050, gracefulStop: 30s)
* navigate_file_tree_020: Up to 100 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: navigate_file_tree_020, gracefulStop: 30s)
* sync_client_110: Up to 200 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: sync_client_110, gracefulStop: 30s)
* user_group_search_070: Up to 200 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: user_group_search_070, gracefulStop: 30s)
✓ authn -> loginPageResponse - status
✓ authn -> authorizationResponse - status
✓ authn -> accessTokenResponse - status
✗ client -> search.searchForSharees - status
↳ 99% — ✓ 3685 / ✗ 3
✓ client -> role.getMyDrives - status
✗ client -> resource.getResourceProperties - status
↳ 99% — ✓ 31800 / ✗ 1
✓ client -> application.createDrive - status
✓ client -> resource.createResource - status
✓ client -> drive.deactivateDrive - status
✓ client -> tag.getTags - status -- (SKIPPED)
✓ client -> tag.createTag - status -- (SKIPPED)
✗ client -> drive.deleteDrive - status
↳ 97% — ✓ 882 / ✗ 23
✗ client -> tag.addTagToResource - status
↳ 0% — ✓ 0 / ✗ 468
✗ client -> tag.removeTagToResource - status
↳ 0% — ✓ 0 / ✗ 468
✗ client -> share.createShare - status
↳ 97% — ✓ 452 / ✗ 10
✓ client -> resource.downloadResource - status
✗ client -> share.deleteShare - status
↳ 97% — ✓ 451 / ✗ 11
✓ client -> resource.deleteResource - status
✓ client -> resource.uploadResource - status
✓ client -> resource.moveResource - status
checks.........................: 98.17% ✓ 52848 ✗ 984
data_received..................: 16 GB 4.4 MB/s
data_sent......................: 7.9 GB 2.2 MB/s
http_req_blocked...............: avg=11.64ms min=200ns med=271ns max=1.07s p(90)=45.63ms p(95)=46.93ms
http_req_connecting............: avg=3.7ms min=0s med=0s max=1.03s p(90)=16.58ms p(95)=16.7ms
http_req_duration..............: avg=345.84ms min=3.64ms med=309.29ms max=20.17s p(90)=520.21ms p(95)=680.26ms
{ expected_response:true }...: avg=345.05ms min=3.64ms med=311.17ms max=9.94s p(90)=521.98ms p(95)=682.72ms
http_req_failed................: 1.86% ✓ 984 ✗ 51912
http_req_receiving.............: avg=6.09ms min=21.48µs med=86.44µs max=5.24s p(90)=152.85µs p(95)=49.11ms
http_req_sending...............: avg=1.56ms min=41.92µs med=98.73µs max=810.9ms p(90)=171.1µs p(95)=198.42µs
http_req_tls_handshaking.......: avg=7.88ms min=0s med=0s max=404.37ms p(90)=28.8ms p(95)=30.05ms
http_req_waiting...............: avg=338.18ms min=3.44ms med=305.45ms max=20.17s p(90)=504.31ms p(95)=648.05ms
http_reqs......................: 52896 14.571894/s
iteration_duration.............: avg=54.87s min=10.23s med=10.36s max=5m26s p(90)=2m30s p(95)=2m33s
iterations.....................: 36132 9.953715/s
vus............................: 5 min=0 max=750
vus_max........................: 750 min=64 max=750
running (1h00m30.0s), 000/750 VUs, 36132 complete and 536 interrupted iterations
add_remove_tag_100 ✓ [======================================] 00/50 VUs 1h0m0s
create_remove_group_share_090 ✓ [======================================] 00/50 VUs 1h0m0s
create_space_080 ✓ [======================================] 00/50 VUs 1h0m0s
create_upload_rename_delete... ✓ [======================================] 00/50 VUs 1h0m0s
download_050 ✓ [======================================] 00/50 VUs 1h0m0s
navigate_file_tree_020 ✓ [======================================] 000/100 VUs 1h0m0s
sync_client_110 ✓ [======================================] 000/200 VUs 1h0m0s
user_group_search_070 ✓ [======================================] 000/200 VUs 1h0m0s
@butonic @wkloucek @fschade @dj4oC
We need a meeting to exchange information.
@tschiebel Can you organize?
@fschade @butonic
Could you write down the learnings / questions in a condensed way?
We need to set all KV streams we use to allow-direct: false
. It will reduce the number of request errors by reading data from the most recent NATS node. This also causes less timeouts, which explains why the tests with allow-direct: false
are faster.
What we don't know, yet is the performance difference between redis and nats.
Are there any downsides to be expected? I'd be happy to change it! :-)
@micbar @butonic Is the oCIS product also adapting this setting?
We need to set all KV streams we use to
allow-direct: false
What we don't know, yet is the performance difference between redis and nats.
I will prepare a oCIS 5.0.0-rc.4 setup with Redis cache store instead of NATS cache store
We need to change it for the helm charts. For the single binary we do not need to change it when running with the built in NATS because it is single node. We need to document that when running an external NATS the KV streams all have to use allow-direct: false
. When creating a KV store the library will always create streams with allow-direct: true
. And currently, the creation cannot be configured. So, needs documentation and changing the default via helm, IMO.
We need to change it for the helm charts. For the single binary we do not need to change it when running with the built in NATS because it is single node. We need to document that when running an external NATS the KV streams all have to use
allow-direct: false
. When creating a KV store the library will always create streams withallow-direct: true
. And currently, the creation cannot be configured. So, needs documentation and changing the default via helm, IMO.
Even though the single binary doesn't need it, I'd prefer to have it on the same value. Otherwise everybody will be confused forever.
@butonic oCIS charts PR: https://github.com/owncloud/ocis-charts/pull/488
same test, same amount of users, but redis instead of nats:
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: /root/cdperf/packages/k6-tests/artifacts/koko-platform-000-mixed-ramping-k6.js
output: -
scenarios: (100.00%) 8 scenarios, 750 max VUs, 1h0m30s max duration (incl. graceful stop):
* add_remove_tag_100: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: add_remove_tag_100, gracefulStop: 30s)
* create_remove_group_share_090: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: create_remove_group_share_090, gracefulStop: 30s)
* create_space_080: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: create_space_080, gracefulStop: 30s)
* create_upload_rename_delete_folder_and_file_040: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: create_upload_rename_delete_folder_and_file_040, gracefulStop: 30s)
* download_050: Up to 50 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: download_050, gracefulStop: 30s)
* navigate_file_tree_020: Up to 100 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: navigate_file_tree_020, gracefulStop: 30s)
* sync_client_110: Up to 200 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: sync_client_110, gracefulStop: 30s)
* user_group_search_070: Up to 200 looping VUs for 1h0m0s over 3 stages (gracefulRampDown: 30s, exec: user_group_search_070, gracefulStop: 30s)
✓ authn -> loginPageResponse - status
✓ authn -> authorizationResponse - status
✓ authn -> accessTokenResponse - status
✗ client -> search.searchForSharees - status
↳ 98% — ✓ 3641 / ✗ 49
✗ client -> role.getMyDrives - status
↳ 98% — ✓ 490 / ✗ 10
✗ client -> resource.getResourceProperties - status
↳ 96% — ✓ 30894 / ✗ 1157
✗ client -> application.createDrive - status
↳ 99% — ✓ 894 / ✗ 9
✗ client -> resource.createResource - status
↳ 95% — ✓ 881 / ✗ 44
✗ client -> drive.deactivateDrive - status
↳ 98% — ✓ 887 / ✗ 16
✓ client -> tag.getTags - status -- (SKIPPED)
✓ client -> tag.createTag - status -- (SKIPPED)
✗ client -> drive.deleteDrive - status
↳ 96% — ✓ 869 / ✗ 34
✗ client -> tag.addTagToResource - status
↳ 97% — ✓ 456 / ✗ 10
✗ client -> share.createShare - status
↳ 91% — ✓ 423 / ✗ 38
✗ client -> tag.removeTagToResource - status
↳ 97% — ✓ 455 / ✗ 11
✗ client -> share.deleteShare - status
↳ 89% — ✓ 412 / ✗ 49
✗ client -> resource.deleteResource - status
↳ 95% — ✓ 880 / ✗ 45
✗ client -> resource.downloadResource - status
↳ 97% — ✓ 890 / ✗ 27
✗ client -> resource.uploadResource - status
↳ 95% — ✓ 442 / ✗ 22
✗ client -> resource.moveResource - status
↳ 95% — ✓ 445 / ✗ 19
checks.........................: 97.15% ✓ 52527 ✗ 1540
data_received..................: 16 GB 4.3 MB/s
data_sent......................: 7.9 GB 2.2 MB/s
http_req_blocked...............: avg=12.22ms min=190ns med=271ns max=5.05s p(90)=46.22ms p(95)=47.72ms
http_req_connecting............: avg=3.83ms min=0s med=0s max=1.04s p(90)=16.63ms p(95)=16.75ms
http_req_duration..............: avg=307.66ms min=4.22ms med=217.56ms max=50.01s p(90)=562.32ms p(95)=833.28ms
{ expected_response:true }...: avg=302.71ms min=4.22ms med=219.46ms max=11.69s p(90)=566.13ms p(95)=832.39ms
http_req_failed................: 2.89% ✓ 1540 ✗ 51595
http_req_receiving.............: avg=4.25ms min=0s med=85.59µs max=7.05s p(90)=142.44µs p(95)=174.71µs
http_req_sending...............: avg=1.59ms min=35.9µs med=101.65µs max=1.03s p(90)=177.04µs p(95)=209.26µs
http_req_tls_handshaking.......: avg=8.31ms min=0s med=0s max=5.04s p(90)=29.31ms p(95)=30.78ms
http_req_waiting...............: avg=301.81ms min=4.07ms med=217.03ms max=50.01s p(90)=544.91ms p(95)=802.37ms
http_reqs......................: 53135 14.637723/s
iteration_duration.............: avg=54.48s min=10.02s med=10.25s max=5m40s p(90)=2m30s p(95)=2m33s
iterations.....................: 36385 10.023404/s
vus............................: 5 min=0 max=750
vus_max........................: 750 min=64 max=750
running (1h00m30.0s), 000/750 VUs, 36385 complete and 534 interrupted iterations
add_remove_tag_100 ✓ [======================================] 00/50 VUs 1h0m0s
create_remove_group_share_090 ✓ [======================================] 00/50 VUs 1h0m0s
create_space_080 ✓ [======================================] 00/50 VUs 1h0m0s
create_upload_rename_delete... ✓ [======================================] 00/50 VUs 1h0m0s
download_050 ✓ [======================================] 00/50 VUs 1h0m0s
navigate_file_tree_020 ✓ [======================================] 000/100 VUs 1h0m0s
sync_client_110 ✓ [======================================] 000/200 VUs 1h0m0s
user_group_search_070 ✓ [======================================] 000/200 VUs 1h0m0s
So we go with NATS. The question is how we make NATS use allow_direct falso for KV stores by default. We may have to change the stream after creating it ... but then clients need to reconnect 🤔
We may have to change the stream after creating it ... but then clients need to reconnect 🤔
Oh, that's a requirement that's hard to fullfill when oCIS doesn't set this from the beginning
@wkloucek The helm-charts are already setting up the KV streams with nats/nack in advance. So, where is the problem?
@wkloucek The helm-charts are already setting up the KV streams with nats/nack in advance. So, where is the problem?
There is no guaranteed timing and NACK is currently not having a reconcile loop :disappointed:
Ok, so we have the same problem in the charts as well as a single binary. We need to create the streams with the correct settings. Or we rely on an external mechanism to create the streams in the right way and die if the stream does not exist. The latter would allow the charts to restart ocis until the streams have been created correctly.
We could try to make the KV store creation configurable, but IIRC the code lives upstream. I guess we can also move the creation logic to the nats-js-kv micre registry and make it configurable. That should also include the type of persistence. And maybe all stream properties?
@butonic @wkloucek please clarify and close the ticket.
@butonic @wkloucek please clarify and close the ticket.
I'd also like to know the current state
https://github.com/owncloud/ocis-charts/pull/488 is pending a review.
@butonic can we close here?
Actually this problem is not solved. https://github.com/owncloud/ocis/issues/8432#issuecomment-1959077741 still applies
Nats jetstream does not guarantee read-after-write consistency OOTB: https://github.com/nats-io/nats-architecture-and-design/blob/main/adr/ADR-8.md#consistency-guarantees
When updating filemetadata we lock the node, write to disk, write the cache (to NATS) and then unlock the file. Until NATS has propagated the value to all nodes other requests might read the old metadata.
oCIS web does a propfind after uploading a file. If the requests hits a different nats cluster node the default consistency guarantees might return old metadata when propagation is slow, eg. due to load.
In a single node deployment we won't see a problem, but in a cluster we might.
There are two scenarios:
Creating new files
When file metadata is not present in the cache we will go to tho storage. No inconsistency problem in this case.
Updating existing files
When updating file metadata of an existing file reading from a different NATS node might return old data: fileid (ok), mtime/etag (no change, search won't index), grants?
Next steps
Build a minimal test scenario that
Long term solution
Use groupcache like sharding of the filemetadata key space so that requests will always be handled by the same responsible pod.
Related issues
Distributed locks
AFAICT NATS does not support distributed locks. kraud cloud wrote a blog post in Nov 2023 describing how to build locks based on NATS streams: https://docs.kraudcloud.com/blog/category/tech/ It links to this comment by the nats devs: https://github.com/nats-io/nats-server/discussions/4803#discussioncomment-7612323:
decomposedfs currently locks msgpack metadata using the filesystem. So the locking part is safe for now.
Cache Directory Metadata
Currently, decomposedfs relies on NFS to provide parent / child relationships, which is slow as we have to disable directory caches. Putting directory listings into the cache is subject to the same read-after-write consistency guarantees. Topic for a new ADR.