temporalio / ui

Temporal UI
https://docs.temporal.io/web-ui
MIT License
164 stars 60 forks source link

Bad performance due to gRPC-connection-per-request #2149

Open varunbpatil opened 2 weeks ago

varunbpatil commented 2 weeks ago

Describe the bug

The UI server creates a gRPC client connection for each request. https://github.com/temporalio/ui/blob/0108a5762a84464e99e6919644c9276f4f264847/server/server/api/handler.go#L77-L101

This is bad for performance. It is recommended to have a single gRPC connection for the lifetime of the application (the same way it is recommended to have a single temporal client for the lifetime of the application).

I ran some benchmarks with 500 requests (100 concurrent requests).

hey -n 500 -c 100 'http://localhost:8080/api/v1/namespaces' > /tmp/one-client.txt

Here are the results.

Connection-per-request (current implementation)
``` Summary: Total: 23.8720 secs Slowest: 18.6736 secs Fastest: 0.7964 secs Average: 3.3815 secs Requests/sec: 20.9451 Response time histogram: 0.796 [1] | 2.584 [261] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 4.372 [114] |■■■■■■■■■■■■■■■■■ 6.160 [76] |■■■■■■■■■■■■ 7.947 [13] |■■ 9.735 [16] |■■ 11.523 [5] |■ 13.310 [4] |■ 15.098 [2] | 16.886 [0] | 18.674 [8] |■ Latency distribution: 10% in 1.0995 secs 25% in 1.4034 secs 50% in 2.3972 secs 75% in 4.2975 secs 90% in 6.0804 secs 95% in 8.8946 secs 99% in 17.9976 secs Details (average, fastest, slowest): DNS+dialup: 0.0052 secs, 0.7964 secs, 18.6736 secs DNS-lookup: 0.0016 secs, 0.0000 secs, 0.0396 secs req write: 0.0010 secs, 0.0000 secs, 0.0380 secs resp wait: 3.3748 secs, 0.7962 secs, 18.6359 secs resp read: 0.0003 secs, 0.0000 secs, 0.0023 secs Status code distribution: [200] 500 responses ```
Single gRPC connection for the lifetime of the application
``` Summary: Total: 8.7794 secs Slowest: 5.6887 secs Fastest: 0.2492 secs Average: 1.5338 secs Requests/sec: 56.9512 Response time histogram: 0.249 [1] | 0.793 [139] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1.337 [125] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1.881 [96] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 2.425 [59] |■■■■■■■■■■■■■■■■■ 2.969 [25] |■■■■■■■ 3.513 [22] |■■■■■■ 4.057 [8] |■■ 4.601 [17] |■■■■■ 5.145 [5] |■ 5.689 [3] |■ Latency distribution: 10% in 0.4974 secs 25% in 0.7003 secs 50% in 1.3042 secs 75% in 1.9413 secs 90% in 3.1785 secs 95% in 4.0926 secs 99% in 4.8942 secs Details (average, fastest, slowest): DNS+dialup: 0.0067 secs, 0.2492 secs, 5.6887 secs DNS-lookup: 0.0011 secs, 0.0000 secs, 0.0470 secs req write: 0.0008 secs, 0.0000 secs, 0.0358 secs resp wait: 1.5257 secs, 0.2489 secs, 5.6884 secs resp read: 0.0005 secs, 0.0000 secs, 0.0042 secs Status code distribution: [200] 500 responses ```

I am happy to create a PR if the proposed solution is acceptable. I am already running it in production at my company.

varunbpatil commented 2 weeks ago

Tagging the original author of the code for comments @robholland .

robholland commented 2 weeks ago

I didn't write the code, but feel free to submit a PR against https://github.com/temporalio/ui-server

varunbpatil commented 2 weeks ago

Doesn't temporalio/ui-server get overwritten by this job - https://github.com/temporalio/ui/blob/main/.github/workflows/update-ui-server.yml?

robholland commented 2 weeks ago

Sorry, you are correct. While since I touched this stuff and had it backwards.