open-telemetry / opentelemetry-js

OpenTelemetry JavaScript Client
https://opentelemetry.io
Apache License 2.0
2.75k stars 808 forks source link

Improve performance to Zipkin exporter #295

Closed OlivierAlbertini closed 2 years ago

OlivierAlbertini commented 5 years ago

Current behaviour

Zipkin exporter use JSON.stringify when exporting spans. https://github.com/open-telemetry/opentelemetry-js/blob/b58965c97e5be0b556d83d0000eff0e3c7e25209/packages/opentelemetry-exporter-zipkin/src/zipkin.ts#L165

Desired behaviour

draffensperger commented 5 years ago

Could you make some benchmarks to see how much of a performance issue this actually would be in a production app? My understanding is that most spans would not be sampled anyway, and we only would export to Zipkin a smallish number of spans.

I see bringing in a new library as potentially exposing end users to security risks / bugs if the dependency encounters those at some point vs. the safe and built in (if non-optimal) JSON.stringify

OlivierAlbertini commented 5 years ago

I would be happy to make a benchmark. Do you have some exemples that could inspire me in order to provide to you something with added values or tools/packages that I could use.

I just want to avoid making a clunky benchmark.

Any feedback would be appreciate.

mayurkale22 commented 5 years ago

I don't have experience with benchmarking tool, but was thinking to use @clevernature/benchmark-regression to perform regression tests.

https://www.npmjs.com/package/@clevernature/benchmark-regression

OlivierAlbertini commented 5 years ago

Thanks Mayur, I will give a try

draffensperger commented 5 years ago

I actually think something you just do as a one-off would be fine - say a simple Node server that processes a hello-world type request, samples at 100% and exports to Zipkin. Run local server and local zipkin and then hammer it with traffic say with Apache bench or similar. See how it performs with the native JSON.stringify vs. fast-json-stringify and see the performance change in the overall server.

I'm not necessarily suggesting this is the best thing to do time right now, just that this would be my suggestion to validate that adding the dependency is worth it.

I also do think more general benchmark tests that run on every PR would be good, but I'd think that could be more for core parts of the library (though maybe Zipkin exporter would be worth doing it for as a whole).

OlivierAlbertini commented 5 years ago

Simpler indeed and good timing since I did https://github.com/open-telemetry/opentelemetry-js/pull/318

OlivierAlbertini commented 5 years ago

Using the http example in this repo (without the setTimeout on the server side):

First ab test is "as is", second is by removing the JSON.stringify and applying a RAW transformation (in order to avoid a dep) because span is not so complex (Jaeger does this too).

~/apps on 🐳 v19.03.2
➜ ab -n 5000 -c 100 http://localhost:8080/helloworld
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /helloworld
Document Length:        12 bytes

Concurrency Level:      100
Time taken for tests:   9.950 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      435000 bytes
HTML transferred:       60000 bytes
Requests per second:    502.49 [#/sec] (mean)
Time per request:       199.010 [ms] (mean)
Time per request:       1.990 [ms] (mean, across all concurrent requests)
Transfer rate:          42.69 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    3   1.6      3      41
Processing:    32  194  56.1    190     445
Waiting:       14  165  50.9    161     441
Total:         32  197  56.2    192     447

Percentage of the requests served within a certain time (ms)
  50%    192
  66%    208
  75%    225
  80%    229
  90%    261
  95%    286
  98%    380
  99%    445
 100%    447 (longest request)

——————————————————————————————————————

➜ ab -n 5000 -c 100 http://localhost:8080/helloworld
This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /helloworld
Document Length:        12 bytes

Concurrency Level:      100
Time taken for tests:   9.413 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      435000 bytes
HTML transferred:       60000 bytes
Requests per second:    531.19 [#/sec] (mean)
Time per request:       188.258 [ms] (mean)
Time per request:       1.883 [ms] (mean, across all concurrent requests)
Transfer rate:          45.13 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   1.8      2      17
Processing:     7  185  47.1    179     347
Waiting:        2  151  42.1    149     278
Total:          7  187  46.9    181     347

Percentage of the requests served within a certain time (ms)
  50%    181
  66%    201
  75%    217
  80%    229
  90%    254
  95%    280
  98%    298
  99%    305
 100%    347 (longest request)

Should we let the JSON.stringify ?

mayurkale22 commented 5 years ago

This is great 💯. Could you please use fast-json-stringify and run the same test?

draffensperger commented 5 years ago

Looks like you are saving ~10ms at the 50p latency, and ~140ms at the 99p latency.

I'm interested in what you said about "applied a RAW transformation (in order to avoid a dep)" - might that be a way to have better performance without introducing a dependency?

Do you know if fast-json-stringify supports a mode where you only include it as a dev dependency and not an install dependency? If you could have fast-json-stringify generate the function for you and then check in that function, I wonder if that could also be a way to avoid the extra install dependency but still get the performance benefit.

pauldraper commented 4 years ago

fast-json-stringify generate the function for you and then check in that function

fast-json-stringify is pretty small. The gain would be minimal.

On the list of things to do, this is at best very far down, and at worst more trouble than it could ever be worth.

draffensperger commented 4 years ago

My thinking with inlining the resulting function from fast-json-stringify rather than depending on the library is based on a security consideration - each new dependency you add, no matter how small, provides an opportunity for the author of that package to be compromised and push a vulnerable version of the package. In principle, could audit every single package dependency all the way down the tree when you do upgrades ..... but that's not super realistic.

So in general, I think we should have a fairly high bar to taking on a new install dependency. Dev dependencies concern me less because a vulnerability there would only affect developers of the library, not end customers and their production apps.

mhennoch commented 3 years ago

Seems like consensus was reached to keep using JSON.stringify? So this can be closed @dyladan

rauno56 commented 2 years ago

Using alternative JSON.stringify is tricky on multiple measures. Especially across different environments.