Open armaaar opened 7 months ago
My experience is that Panel can handle the initial load of 2-3 users per second per process.
Its not been a problem for my use cases so far. But it also means that there are use cases I stay away from currently.
Thanks for the super helpful load testing framework, I have actually been working on moving the session creation to a thread and am aiming to have a version of that in Panel 1.5.0.
One quick question after reviewing your document, was the admin page always enabled when you ran the load tests? I'm asking because the admin dashboard itself may be adding non-insignificant overhead. After disabling it here are the timings I get with latest Panel:
browser_data_received...............................: 1.8 GB 6.4 MB/s
browser_data_sent...................................: 5.7 MB 20 kB/s
browser_http_req_duration...........................: avg=385.23ms min=2.03ms med=245.18ms max=11.74s p(90)=584.01ms p(95)=736.06ms
browser_http_req_failed.............................: 0.00% ✓ 0 ✗ 14336
browser_web_vital_cls...............................: avg=0 min=0 med=0 max=0 p(90)=0 p(95)=0
✗ browser_web_vital_fcp...............................: avg=1.54s min=227.1ms med=920.1ms max=9.37s p(90)=4.33s p(95)=5.2s
✗ { scenario:load }.................................: avg=1.55s min=227.1ms med=925ms max=9.37s p(90)=4.34s p(95)=5.2s
✓ { scenario:single_load }..........................: avg=880.29ms min=827.2ms med=873.45ms max=961.5ms p(90)=915.15ms p(95)=938.32ms
✓ { scenario:single }...............................: avg=334.69ms min=334.69ms med=334.69ms max=334.69ms p(90)=334.69ms p(95)=334.69ms
✗ browser_web_vital_lcp...............................: avg=1.67s min=321.19ms med=1.08s max=9.38s p(90)=4.35s p(95)=5.23s
✗ { scenario:load }.................................: avg=1.69s min=321.19ms med=1.08s max=9.38s p(90)=4.36s p(95)=5.23s
✓ { scenario:single_load }..........................: avg=1s min=957.09ms med=988.3ms max=1.13s p(90)=1.03s p(95)=1.08s
✓ { scenario:single }...............................: avg=453.89ms min=453.89ms med=453.89ms max=453.89ms p(90)=453.89ms p(95)=453.89ms
browser_web_vital_ttfb..............................: avg=724.23ms min=43.19ms med=327.04ms max=7.94s p(90)=2.56s p(95)=3.78s
✓ checks..............................................: 99.83% ✓ 1184 ✗ 2
✓ { check:page_loaded }.............................: 99.83% ✓ 592 ✗ 1
✓ { check:server_responded }........................: 99.83% ✓ 592 ✗ 1
✓ { scenario:load,check:page_loaded }...............: 99.82% ✓ 581 ✗ 1
✓ { scenario:load,check:server_responded }..........: 99.82% ✓ 581 ✗ 1
✓ { scenario:load }.................................: 99.82% ✓ 1162 ✗ 2
✓ { scenario:single_load,check:page_loaded }........: 100.00% ✓ 10 ✗ 0
✓ { scenario:single_load,check:server_responded }...: 100.00% ✓ 10 ✗ 0
✓ { scenario:single_load }..........................: 100.00% ✓ 20 ✗ 0
✓ { scenario:single,check:page_loaded }.............: 100.00% ✓ 1 ✗ 0
✓ { scenario:single,check:server_responded }........: 100.00% ✓ 1 ✗ 0
✓ { scenario:single }...............................: 100.00% ✓ 2 ✗ 0
data_received.......................................: 0 B 0 B/s
data_sent...........................................: 0 B 0 B/s
iteration_duration..................................: avg=3.81s min=542.74ms med=2.54s max=29.5s p(90)=6.52s p(95)=7.09s
iterations..........................................: 593 2.124812/s
vus.................................................: 5 min=0 max=10
vus_max.............................................: 10 min=10 max=10
The load timings in particular are significantly better, though of course still nowhere near where we want them to be:
✗ { scenario:load }.................................: avg=1.55s min=227.1ms med=925ms max=9.37s p(90)=4.34s p(95)=5.2s
Okay, my investigations showed multiple things:
serve
) bypasses a number of optimizations that we do implement for file based applications, e.g. surrounding session cleanup, and also does not perform profiling or tracking the actual session creation timings.One more note about threading, the actual bottleneck likely isn't even the session creation itself (particularly for an app as simple as the one we are profiling) but rather that currently all websocket connections are owned by a single thread. This means that when the server has to send data to multiple sessions simultaneously, all the messages are queuing up to be sent to the frontend. This means that while sessions may be created in parallel, as long as the main thread is locked up none of the messages will actually be sent out.
The solution will likely be to maintain a completely independent thread pool that grabs events ready to be sent off of a queue and dispatches them.
Try to look at Gradio. They are probably the data app framework that has had biggest scaling due to usage on Hugging Face.
My understanding is that every event is sent to queue that is shared across all users to enable scaling.
You can also look at Streamlit they run every user in separate thread. It solves some problems but creates others from what I've heard.
The main problem as I see it is that our "template" is quite heavy and custom.
Streamlit uses same minimal html template to start web socket. Then rest is sent. It means they can respond to many more initial http requests.
Their server architectures are very different so I don't think there's much we can steal from them. I have a pretty good idea how we can scale the applications better. Moving session creation and destruction to a thread is one part of it but the bigger part will be to move the websocket writes to a thread.
The main problem as I see it is that our "template" is quite heavy and custom.
This isn't really true, or rather isn't what's causing the problem. The problem is that we create a session (i.e. run the user code) before the initial template is sent. You can already work around this by enabling config.reuse_sessions
or setting --reuse-sessions
.
I agree server architectures are very different and you cannot copy code. Still I believe there is a good probability of finding ideas and understanding scaling requirements in 2024.
@armaaar - thanks for putting this together!
@philippjfr or @MarcSkovMadsen - is there a recommended way to define "page loaded" in such load tests for Panel?
I wanted to try with the latest panel==1.4.0rc2
but then it seems the same selector doesn't work any more?
https://github.com/armaaar/panel-performance/blob/405aca725cf83f4d1e9722457d3b60657610670d/k6.js#L83
ERRO[0053] Uncaught (in promise) GoError: waiting for selector ".pn-loading.pn-arc": waiting for selector ".pn-loading.pn-arc" did not result in any nodes
single at github.com/grafana/xk6-browser/browser.mapPage.func13 (native)
single_lat file:///Users/cdeil/code/hc/panel-performance/k6.js:91:29(30) executor=constant-vus scenario=load
If you think k6 is useful for Panel users would be great if you could add a mention to the docs with the best practice to define "page loaded" in a robust way. https://panel.holoviz.org/how_to/test/loadtests.html
One more thought on this:
browser_data_received...............................: 1.8 GB 6.4 MB/s
I think this load test is very atypical since for normal Panel apps there's a few MB of JS/CSS loaded once but then cached by the browser, but here possibly k6 constantly comes with clean browser caches and to a certain extent just puts load on Panel as a static file server?
If I go to https://panel-gallery-dev.holoviz.dsp.anaconda.com/portfolio_analyzer I see page load times vary between sometimes 1 sec and sometimes 10 or 20 seconds.
Also I see that 8 MB resources (???) and 2.3 MB transferred but if I have browser cache on only 7 kB are transferred.
That's normal behaviour and correct interpretation of the Chrome network tab load timings and download sizes?
That server is severely underserved right now unfortunately, we're working on it but for now those numbers may be right. I also had to update the k6 script, will make a PR for an improved benchmark on latest Panel tomorrow or Monday.
Okay, I tracked down the main issue that was plaguing the particular test case above. Specifically the problem was that during session cleanup Bokeh was calling gc.collect()
for each individual sessions, which quickly adds up when called 20x consecutively. I had already worked around a separate gc.collect()
call by scheduling the garbage collection 5 seconds after a session was destroyed and ensuring that would only be called once but had not realized Bokeh was performing one more completely separate GC cycle. I have now unified that into a single gc.collect()
which is debounced to be run 5 seconds after any session was destroyed, i.e. if another session is destroyed in those 5 seconds it will again postpone it by 5 seconds.
While not perfect this did make a big difference to the timing:
browser_data_received...............................: 2.3 GB 9.0 MB/s
browser_data_sent...................................: 7.3 MB 29 kB/s
browser_http_req_duration...........................: avg=199.74ms min=1.69ms med=105.07ms max=2.74s p(90)=473.84ms p(95)=594.83ms
browser_http_req_failed.............................: 0.00% ✓ 0 ✗ 18289
browser_web_vital_cls...............................: avg=0 min=0 med=0 max=0 p(90)=0 p(95)=0
✗ browser_web_vital_fcp...............................: avg=775.63ms min=505.79ms med=650.7ms max=3.15s p(90)=1.14s p(95)=1.5s
✗ { scenario:load }.................................: avg=773.3ms min=505.79ms med=644.5ms max=3.15s p(90)=1.14s p(95)=1.5s
✓ { scenario:single_load }..........................: avg=965.38ms min=908.5ms med=956.44ms max=1.09s p(90)=994.71ms p(95)=1.04s
✓ { scenario:single }...............................: avg=594ms min=594ms med=594ms max=594ms p(90)=594ms p(95)=594ms
✓ browser_web_vital_lcp...............................: avg=1.18s min=880.5ms med=1.04s max=3.17s p(90)=1.68s p(95)=1.96s
✓ { scenario:load }.................................: avg=1.17s min=880.5ms med=1.03s max=3.17s p(90)=1.68s p(95)=1.96s
✓ { scenario:single_load }..........................: avg=1.57s min=1.45s med=1.55s max=1.68s p(90)=1.67s p(95)=1.67s
✓ { scenario:single }...............................: avg=994.89ms min=994.89ms med=994.89ms max=994.89ms p(90)=994.89ms p(95)=994.89ms
browser_web_vital_ttfb..............................: avg=173.9ms min=39.5ms med=111.6ms max=1.33s p(90)=352.05ms p(95)=506.59ms
✓ checks..............................................: 100.00% ✓ 1498 ✗ 0
✓ { check:page_loaded }.............................: 100.00% ✓ 749 ✗ 0
✓ { check:server_responded }........................: 100.00% ✓ 749 ✗ 0
✓ { scenario:load,check:page_loaded }...............: 100.00% ✓ 738 ✗ 0
✓ { scenario:load,check:server_responded }..........: 100.00% ✓ 738 ✗ 0
✓ { scenario:load }.................................: 100.00% ✓ 1476 ✗ 0
✓ { scenario:single_load,check:page_loaded }........: 100.00% ✓ 10 ✗ 0
✓ { scenario:single_load,check:server_responded }...: 100.00% ✓ 10 ✗ 0
✓ { scenario:single_load }..........................: 100.00% ✓ 20 ✗ 0
✓ { scenario:single,check:page_loaded }.............: 100.00% ✓ 1 ✗ 0
✓ { scenario:single,check:server_responded }........: 100.00% ✓ 1 ✗ 0
✓ { scenario:single }...............................: 100.00% ✓ 2 ✗ 0
data_received.......................................: 0 B 0 B/s
data_sent...........................................: 0 B 0 B/s
iteration_duration..................................: avg=1.91s min=1.1s med=1.5s max=7.05s p(90)=3.14s p(95)=4.19s
iterations..........................................: 749 2.99838/s
vus.................................................: 5 min=0 max=10
vus_max.............................................: 20 min=20 max=20
The average loading time is now well below 1 second and even the p95 is "only" 1.5 seconds.
Once we tried to measure the performance of Panel apps using K6, we noticed huge degredation in performance for only 10 users using the app simultanously, even after enabling defer loading and refactor pages into classes.
We have created a "Hello, World!" Panel page in isolation and ran our load tests against it, which shows huge performance degredation the more the panel server is hit. We suspect that a huge part of it is from how Panel handles sessions. Apparently, sessions destruction (and maybe initialization as well) is not using threads and is blocking the whole server.
Here is a repository that demostrates the performance degredation of panel app, being hit by 10 users simultanously: https://github.com/armaaar/panel-performance
ALL software version info
Cores: 10 (8 performance and 2 efficiency)
Threads: 14
Description of expected behavior and the observed behavior
Complete, minimal, self-contained example code that reproduces the issue
Please refer to our test repository here: https://github.com/armaaar/panel-performance. The readme file has instalation, how to run, and results documented.
Screenshots or screencasts of the bug in action
Everything can be found in the repo readme: https://github.com/armaaar/panel-performance/blob/master/README.md