Closed marchdaisy closed 2 years ago
@marchdaisy - thank you for opening. Will hold off on release until we can resolve this
@marchdaisy can you share what you are using for storage backend?
@marchdaisy hihi , and can you share your chartmuseum startup command ?
@marchdaisy can you share what you are using for storage backend?
@jdolitsky Thank you for your response. We have tested with both PVC (VolumeMode: Filesystem) and MinIO backend. Both of them have such issue I mentioned above.
@marchdaisy hihi , and can you share your chartmuseum startup command ?
@scbizu Thank you for your response. We startup the chartmuseum in a container by a Dockerfile. And the following is the related env. env:
It seems wired that your pprof shows requests are blocked in fs I/O (chart loading) , I tried to run the 0.13.0-pre
loadtesting on my local machine (mbp 2018), the RPS could reach ~500 req/s.
BTW , thanks for your reply , we will check the new release again , and keep this issue as the new release blocking issue until we have some more optimization .
Do you have some update for this problem? I use the latest v0.13.1 version, and the issue still exists in my test scenario. Do you need some more information to help to debug this issue?
Hi - are you using the loadtesting suite here:
https://github.com/helm/chartmuseum/tree/main/loadtesting
Or are you creating your own locustfile.py
? Can you share this code?
@jdolitsky Thanks for your reply!
The locustfile.py I used is similar to the one in above link. The difference is as follows:
class UserBehavior(TaskSet): tasks = {index: 1, post_new_chart: 10}
class WebsiteUser(HttpLocust): task_set = UserBehavior _min_wait = 0 maxwait = 0
For previous chartmuseum version 0.12.0, the request rate for uploading charts could arrive 300 req/s. But now there is cpu problem when uploading charts.
Hi, @scbizu Could you please help to update the status for this issue? I saw that there is a related PR under review. Do you have a plan to merge the solution in the next v0.13.2?
Yeah , after we prove that this PR will exactly improve the load-testing , we haven't test it widely though.
Hi, @scbizu Have you fixed this issue in v0.14.0? I found this PR https://github.com/helm/chartmuseum/pull/457 is still the milestone of v0.15.0. I wonder that do you still have some work for the performance issue? By the way, I tested the latest chartmuseum version with the scenario for concurrency uploading charts and geting index, and the above problem didn't appear any more. I think that there are some related commits that would fix the issue, right?
@marchdaisy the latest version , you mean canary or v0.14.0 ?
Hi, @scbizu The latest version I mentioned is v0.14.0. I made a mistake during uplifting the chartmuseum to v0.14.0 yesterday. So actually, the result is all right with v0.12.0, but with v0.14.0, the issue still existed. Will you fix it in v0.15.0?
@marchdaisy Yep , the latency entered due to our 0.13.0's cache mechanism , perhaps you can set cache_interval to 0 and see if issue still exists , however, our next major milestone will fix this , stay tuned :P
Hi, @scbizu I found that the default cache_interval is 0s. (set the interval of delta updating the cache (default: 0s)) Does it mean that upon each request, the storage backend is scanned for changes compared to the cache? From my point of view, the performance will be worse with cache_interval 0s than 5m or even longer interval. So I am confusing about your suggestion, could you please help to explain more about it? Thank you!
@marchdaisy 👋 the changes in #556 should help here. If you want to test with the canary
or latest
image tag (otherwise it will be included in v0.15.0), I’d be interested in the results!
Hi, @cbuto The fix works for my problem! I used the latest code and request rate for uploading charts remained the same as in v0.12.0. Thank you for your fix!
Chartmuseum version:
latest version with last commit: "ci: migrate release from docker hub into github container registry #395"
Problem
Using locust to have the loadtesting, the request rate for uploading charts drops a lot.
Description
Previously
Previously chartmuseum release version v0.12.0 with pvc: locust -f locustfile.py --no-web -L DEBUG -c 500 -r 50 -n 300000 --csv=result We define posting new charts in locustfile.py. The request rate for uploading charts could arrive 300 req/s.
Now
But when we use the latest version, the request rate drops to 15 req/s and after the charts arrives 525, the post requests begin to fail.
From the chartmuseum pod, we found the CPU is very high. top - 05:22:03 up 31 days, 23:27, 0 users, load average: 2.01, 2.33, 1.40 Tasks: 3 total, 1 running, 2 sleeping, 0 stopped, 0 zombie %Cpu(s): 22.0 us, 2.0 sy, 0.2 ni, 75.4 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st MiB Mem : 32159.41+total, 7854.578 free, 3791.039 used, 20513.79+buff/cache MiB Swap: 0.000 total, 0.000 free, 0.000 used. 27340.60+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 61357 20 0 824296 167388 31080 S 152.8 0.508 13:02.71 chartmuseum 33 61357 20 0 14200 3428 2980 S 0.000 0.010 0:00.00 sh 38 61357 20 0 37124 3496 2968 R 0.000 0.011 0:00.00 top
And below is some information shown by go tool pprof.
profile: (pprof) top Showing nodes accounting for 14230ms, 37.70% of 37750ms total Dropped 448 nodes (cum <= 188.75ms) Showing top 10 nodes out of 194 flat flat% sum% cum cum% 3040ms 8.05% 8.05% 4920ms 13.03% runtime.scanobject 2300ms 6.09% 14.15% 2300ms 6.09% runtime.memclrNoHeapPointers 1470ms 3.89% 18.04% 3160ms 8.37% regexp.(*Regexp).tryBacktrack 1450ms 3.84% 21.88% 1450ms 3.84% runtime.memmove 1240ms 3.28% 25.17% 2310ms 6.12% gopkg.in/yaml%2ev2.yaml_emitter_analyze_scalar 1110ms 2.94% 28.11% 1340ms 3.55% runtime.heapBitsSetType 1010ms 2.68% 30.78% 2640ms 6.99% gopkg.in/yaml%2ev2.yaml_parser_scan_flow_scalar 1000ms 2.65% 33.43% 6410ms 16.98% runtime.mallocgc 840ms 2.23% 35.66% 840ms 2.23% gopkg.in/yaml%2ev2.is_blankz 770ms 2.04% 37.70% 890ms 2.36% runtime.findObject (pprof)
heap (pprof) top Showing nodes accounting for 12423.42kB, 82.91% of 14983.58kB total Showing top 10 nodes out of 99 flat flat% sum% cum cum% 3584.98kB 23.93% 23.93% 4097kB 27.34% helm.sh/helm/v3/pkg/chart/loader.LoadFiles 3081.01kB 20.56% 44.49% 3081.01kB 20.56% bufio.NewWriterSize 1536.21kB 10.25% 54.74% 5633.21kB 37.60% helm.sh/chartmuseum/pkg/repo.ChartVersionFromStorageObject 1121.44kB 7.48% 62.23% 1121.44kB 7.48% gopkg.in/yaml%2ev2.yaml_string_write_handler 528.17kB 3.52% 65.75% 528.17kB 3.52% regexp.(bitState).reset 520.04kB 3.47% 69.22% 520.04kB 3.47% bytes.makeSlice 514kB 3.43% 72.65% 514kB 3.43% bufio.NewReaderSize 513.31kB 3.43% 76.08% 513.31kB 3.43% regexp/syntax.(compiler).inst 512.14kB 3.42% 79.50% 512.14kB 3.42% github.com/gin-gonic/gin.(Context).Set 512.12kB 3.42% 82.91% 512.12kB 3.42% github.com/gin-gonic/gin.(Engine).allocateContext
allocs (pprof) top Showing nodes accounting for 168.86GB, 91.89% of 183.76GB total Dropped 324 nodes (cum <= 0.92GB) Showing top 10 nodes out of 93 flat flat% sum% cum cum% 92.12GB 50.13% 50.13% 100.12GB 54.48% gopkg.in/yaml%2ev2.yaml_emitter_emit 19.59GB 10.66% 60.79% 19.98GB 10.87% regexp.(Regexp).backtrack 19.58GB 10.66% 71.45% 39.36GB 21.42% regexp.(Regexp).FindStringSubmatch 9.66GB 5.25% 76.70% 49.02GB 26.67% github.com/Masterminds/semver/v3.NewVersion 9.38GB 5.11% 81.81% 9.38GB 5.11% gopkg.in/yaml%2ev2.(*parser).node 8GB 4.35% 86.16% 8GB 4.35% gopkg.in/yaml%2ev2.yaml_string_write_handler 4.28GB 2.33% 88.49% 4.28GB 2.33% bytes.makeSlice 3.05GB 1.66% 90.15% 3.05GB 1.66% gopkg.in/yaml%2ev2.read 1.67GB 0.91% 91.06% 1.67GB 0.91% reflect.mapassign 1.52GB 0.83% 91.89% 1.52GB 0.83% reflect.New