Open MarcSkovMadsen opened 2 years ago
I can see the behavior is also there if the test is just clicking "refresh" in the browser as fast as it feel possible.
You can dig into the details of these tests here @jbednar and @philippjfr
Thanks @MarcSkovMadsen! I've observed something similar in production but thought it was down to that specific application. This is probably the highest priority issue on the bug tracker so I'll report back asap.
Okay, figured out what the issue is here. What happens is that after a session is shut down it gets cleaned up, if there's several hundred sessions being created then cleaning them all up happens in sequence. One session cleanup takes about 80 ms on my machine
2022-01-17 15:34:27,913 Deleting 1 modules for document <bokeh.document.document.Document object at 0x7fdb3fc55670>
2022-01-17 15:34:27,958 Session 'AY0W4744u58xUmaFhOK6mLfiMxtpGyL8cEyGn73LfPDH' was successfully discarded
2022-01-17 15:34:27,995 Discarding session 'eEoaDv54FqVVTjUg2G45HU0n4jRKkVuUfp5Enit6ZyRs' last in use 30073.157223000002 milliseconds ago
For 500 sessions that takes about 40 seconds during which no new requests are processed. I'm not hugely worried about this since it's outside the general scope of the usage patterns I would expect from a Panel application. Nonetheless it would be nice if we could get Bokeh to prioritize requests to process new sessions over cleaning up old sessions. I will also look into what it will take to speed up creation of a new session.
I did similar tests on Streamlit and Dash. They don't exhibit this behavior. Streamlit has a very different approach. It just sends the same small javascript to initiate the web socket - independent of the application.
This is indeed substantially different to Bokeh, which creates a user session immediately on the first request and sends back an HTML page containing all the required JS/CSS for the page and the JS to open the websocket. So when the websocket request arrives the session will already be warmed up. IMO although this means the initial request isn't as fast as it could be this approach is still preferable overall since the page can load resources at the same time as the websocket connection opens ensuring that those two things can happen in parallel.
There's a relatively simple improvement we can make in Bokeh which should speed up session cleanup by a factor of about 5-10x. Currently each time a Document is destroyed Bokeh will call gc.collect()
which is quite expensive. When many sessions are cleaned up in quick succession this is particularly superfluous so instead we should only gc.collect()
after all expired sessions have been discarded. For 500 sessions this will still result in a non-responsive server for 5-10 seconds but that's definitely preferable over the current 40-50 second wait.
What triggers the "clean up". What I saw was that I could create 100-300 requests/ session. But then suddenly the "clean up" was triggered and everything was on hold.
Just brainstorming.
Good question. I think N should be some multiple of the number of users that can simultaneously interact with the app. While streamlit and dash can serve 10k sessions quickly they will face the same problem when they actually connect. The server will have to provide all the initial state to those sessions and that's where I think they'll hit similar limits. So I'd love to extend the actual test here to try to open the websocket or make the HTTP request dash makes.
There was some discussion a long time ago allowing the initial request in bokeh to be serviced without creating a session. That would close the gap for the number of requests Panel can handle in your current test.
As for what triggers the cleanup, it is triggered by a timeout. Every N seconds a background task collects all sessions without an active websocket connection and if they've reached some expiration time they are destroyed.
I'd say the number of users that can use an individual server and have a good experience depends a lot on the app and it's usage patterns. For panel that number probably ranges from 1-50 for apps requiring at least some Python callbacks. The new --num-threads option can hopefully increase that number a bit. Beyond that you'll need --num-procs or horizontal scaling with a reverse proxy.
So I think we should be able to create 100 simultaneous sessions in fairly quick succession and without major slowdowns during cleanup.
Being able to serve the initial request without creating a new session would also be nice and shouldn't be too difficult. It could then still warm up the session in the background so it is available as soon as the websocket request arrives.
I'm trying to measure the performance of Pythons data app frameworks. I'm using Locust to measure the performance of the initial page load. @jbednar pointed out it would be interesting here https://twitter.com/JamesABednar/status/1481319080598683648?s=20.
As you can see below. There is a very, very big difference between the medium and the max time it takes to get the initial page response. The server is sometimes not responding for more than a minute.
https://user-images.githubusercontent.com/42288570/149672116-b922d775-3d45-4ac5-83fe-2f4df8febcd0.mp4
(video has been speeded up x4 to be able to upload it)
In the video you can see the first time the server is blocked is at ~13s/ 52s.
At around 58s/ 3mins 52s an exception is raised.
Discussion
My hypothesis is that the process is not performing at its max. I did see any indications that its constrained by the 1 core or memory available. Instead there is some code problem. Is it trying to push a message through a web socket and waiting for a response that never comes?
I did similar tests on Streamlit and Dash. They don't exhibit this behavior. Streamlit has a very different approach. It just sends the same small javascript to initiate the web socket - independent of the application. It's a factor 100 faster than Panel (1ms in some cases). But then the page load to the user takes some time. In my preliminary experiments I believe Panel actually renders to the user faster.
The video test was running on my windows laptop. But I also did similar experiments running locust and panel in a linux docker container. See my repo https://github.com/MarcSkovMadsen/data-app-performance
Reproducible Example
requirements.txt
slider_plot_panel.py
page_load_panel.py
Stack trace (from video)