omgnetwork / elixir-omg

OMG-Network repository of Watcher and Watcher Info
https://omg.network
Apache License 2.0
213 stars 59 forks source link

Collect metrics in the perf project #1759

Closed ayrat555 closed 4 years ago

ayrat555 commented 4 years ago

This PR adds aggregation of http request metrics to watcher info and childchain, using histogrex (it's also used in chaperon) The following metrics are collected:

I can add additional metrics if they're needed

Examples:

  1. Running `transactions test with 100 tests/s over 10 seconds:

%{ "Childchain.submit_success" => %{ :max => 296959, :mean => 30582.14, :min => 1576, :total_count => 2000, {:percentile, 10.0} => 2223, {:percentile, 20.0} => 3055, {:percentile, 30.0} => 4479, {:percentile, 40.0} => 6495, {:percentile, 50.0} => 8767, {:percentile, 60.0} => 15231, {:percentile, 75.0} => 40959, {:percentile, 80.0} => 51711, {:percentile, 85.0} => 66559, {:percentile, 90.0} => 89599, {:percentile, 95.0} => 120319, {:percentile, 99.0} => 234495, {:percentile, 99.9} => 284671, {:percentile, 99.99} => 296959, {:percentile, 99.999} => 296959 }, "WatcherInfo.get_balances_failure" => %{ :max => 1036287, :mean => 556976.4033613446, :min => 144384, :total_count => 238, {:percentile, 10.0} => 333823, {:percentile, 20.0} => 382975, {:percentile, 30.0} => 434175, {:percentile, 40.0} => 483327, {:percentile, 50.0} => 528383, {:percentile, 60.0} => 606207, {:percentile, 75.0} => 712703, {:percentile, 80.0} => 741375, {:percentile, 85.0} => 765951, {:percentile, 90.0} => 786431, {:percentile, 95.0} => 864255, {:percentile, 99.0} => 942079, {:percentile, 99.9} => 1036287, {:percentile, 99.99} => 1036287, {:percentile, 99.999} => 1036287 }, "WatcherInfo.get_balances_success" => %{ :max => 1359871, :mean => 57645.05105065543, :min => 1288, :total_count => 15181, {:percentile, 10.0} => 2527, {:percentile, 20.0} => 4095, {:percentile, 30.0} => 6367, {:percentile, 40.0} => 9087, {:percentile, 50.0} => 13119, {:percentile, 60.0} => 20095, {:percentile, 75.0} => 51199, {:percentile, 80.0} => 81407, {:percentile, 85.0} => 115199, {:percentile, 90.0} => 177151, {:percentile, 95.0} => 294911, {:percentile, 99.0} => 462847, {:percentile, 99.9} => 987135, {:percentile, 99.99} => 1277951, {:percentile, 99.999} => 1359871 }, "WatcherInfo.get_utxos_failure" => %{ :max => 1064959, :mean => 552673.6842105263, :min => 141312, :total_count => 304, {:percentile, 10.0} => 301055, {:percentile, 20.0} => 387071, {:percentile, 30.0} => 440319, {:percentile, 40.0} => 485375, {:percentile, 50.0} => 532479, {:percentile, 60.0} => 598015, {:percentile, 75.0} => 704511, {:percentile, 80.0} => 733183, {:percentile, 85.0} => 757759, {:percentile, 90.0} => 782335, {:percentile, 95.0} => 856063, {:percentile, 99.0} => 1032191, {:percentile, 99.9} => 1064959, {:percentile, 99.99} => 1064959, {:percentile, 99.999} => 1064959 }, "WatcherInfo.get_utxos_success" => %{ :max => 1359871, :mean => 87400.3442185938, :min => 1512, :total_count => 12004, {:percentile, 10.0} => 4543, {:percentile, 20.0} => 8447, {:percentile, 30.0} => 13631, {:percentile, 40.0} => 20479, {:percentile, 50.0} => 33279, {:percentile, 60.0} => 55551, {:percentile, 75.0} => 107007, {:percentile, 80.0} => 134143, {:percentile, 85.0} => 183295, {:percentile, 90.0} => 272383, {:percentile, 95.0} => 362495, {:percentile, 99.0} => 514047, {:percentile, 99.9} => 950271, {:percentile, 99.99} => 1269759, {:percentile, 99.999} => 1359871 }, "test_success" => %{ :max => 31064063, :mean => 24122097.664, :min => 17694720, :total_count => 1000, {:percentile, 10.0} => 20709375, {:percentile, 20.0} => 21495807, {:percentile, 30.0} => 22544383, {:percentile, 40.0} => 23199743, {:percentile, 50.0} => 24117247, {:percentile, 60.0} => 24772607, {:percentile, 75.0} => 26345471, {:percentile, 80.0} => 27000831, {:percentile, 85.0} => 27394047, {:percentile, 90.0} => 28180479, {:percentile, 95.0} => 28966911, {:percentile, 99.0} => 30015487, {:percentile, 99.9} => 30932991, {:percentile, 99.99} => 31064063, {:percentile, 99.999} => 31064063 } }


2. Running `deposits` with  100 tests/s over 2s
```elixir

%{
  "Childchain.submit_success" => %{
    :max => 296959,
    :mean => 30582.14,
    :min => 1576,
    :total_count => 2000,
    {:percentile, 10.0} => 2223,
    {:percentile, 20.0} => 3055,
    {:percentile, 30.0} => 4479,
    {:percentile, 40.0} => 6495,
    {:percentile, 50.0} => 8767,
    {:percentile, 60.0} => 15231,
    {:percentile, 75.0} => 40959,
    {:percentile, 80.0} => 51711,
    {:percentile, 85.0} => 66559,
    {:percentile, 90.0} => 89599,
    {:percentile, 95.0} => 120319,
    {:percentile, 99.0} => 234495,
    {:percentile, 99.9} => 284671,
    {:percentile, 99.99} => 296959,
    {:percentile, 99.999} => 296959
  },
  "WatcherInfo.get_balances_failure" => %{
    :max => 1036287,
    :mean => 556976.4033613446,
    :min => 144384,
    :total_count => 238,
    {:percentile, 10.0} => 333823,
    {:percentile, 20.0} => 382975,
    {:percentile, 30.0} => 434175,
    {:percentile, 40.0} => 483327,
    {:percentile, 50.0} => 528383,
    {:percentile, 60.0} => 606207,
    {:percentile, 75.0} => 712703,
    {:percentile, 80.0} => 741375,
    {:percentile, 85.0} => 765951,
    {:percentile, 90.0} => 786431,
    {:percentile, 95.0} => 864255,
    {:percentile, 99.0} => 942079,
    {:percentile, 99.9} => 1036287,
    {:percentile, 99.99} => 1036287,
    {:percentile, 99.999} => 1036287
  },
  "WatcherInfo.get_balances_success" => %{
    :max => 1359871,
    :mean => 57645.05105065543,
    :min => 1288,
    :total_count => 15181,
    {:percentile, 10.0} => 2527,
    {:percentile, 20.0} => 4095,
    {:percentile, 30.0} => 6367,
    {:percentile, 40.0} => 9087,
    {:percentile, 50.0} => 13119,
    {:percentile, 60.0} => 20095,
    {:percentile, 75.0} => 51199,
    {:percentile, 80.0} => 81407,
    {:percentile, 85.0} => 115199,
    {:percentile, 90.0} => 177151,
    {:percentile, 95.0} => 294911,
    {:percentile, 99.0} => 462847,
    {:percentile, 99.9} => 987135,
    {:percentile, 99.99} => 1277951,
    {:percentile, 99.999} => 1359871
  },
  "WatcherInfo.get_utxos_failure" => %{
    :max => 1064959,
    :mean => 552673.6842105263,
    :min => 141312,
    :total_count => 304,
    {:percentile, 10.0} => 301055,
    {:percentile, 20.0} => 387071,
    {:percentile, 30.0} => 440319,
    {:percentile, 40.0} => 485375,
    {:percentile, 50.0} => 532479,
    {:percentile, 60.0} => 598015,
    {:percentile, 75.0} => 704511,
    {:percentile, 80.0} => 733183,
    {:percentile, 85.0} => 757759,
    {:percentile, 90.0} => 782335,
    {:percentile, 95.0} => 856063,
    {:percentile, 99.0} => 1032191,
    {:percentile, 99.9} => 1064959,
    {:percentile, 99.99} => 1064959,
    {:percentile, 99.999} => 1064959
  },
  "WatcherInfo.get_utxos_success" => %{
    :max => 1359871,
    :mean => 87400.3442185938,
    :min => 1512,
    :total_count => 12004,
    {:percentile, 10.0} => 4543,
    {:percentile, 20.0} => 8447,
    {:percentile, 30.0} => 13631,
    {:percentile, 40.0} => 20479,
    {:percentile, 50.0} => 33279,
    {:percentile, 60.0} => 55551,
    {:percentile, 75.0} => 107007,
    {:percentile, 80.0} => 134143,
    {:percentile, 85.0} => 183295,
    {:percentile, 90.0} => 272383,
    {:percentile, 95.0} => 362495,
    {:percentile, 99.0} => 514047,
    {:percentile, 99.9} => 950271,
    {:percentile, 99.99} => 1269759,
    {:percentile, 99.999} => 1359871
  },
  "test_success" => %{
    :max => 31064063,
    :mean => 24122097.664,
    :min => 17694720,
    :total_count => 1000,
    {:percentile, 10.0} => 20709375,
    {:percentile, 20.0} => 21495807,
    {:percentile, 30.0} => 22544383,
    {:percentile, 40.0} => 23199743,
    {:percentile, 50.0} => 24117247,
    {:percentile, 60.0} => 24772607,
    {:percentile, 75.0} => 26345471,
    {:percentile, 80.0} => 27000831,
    {:percentile, 85.0} => 27394047,
    {:percentile, 90.0} => 28180479,
    {:percentile, 95.0} => 28966911,
    {:percentile, 99.0} => 30015487,
    {:percentile, 99.9} => 30932991,
    {:percentile, 99.99} => 31064063,
    {:percentile, 99.999} => 31064063
  }
}

%{
  "WatcherInfo.create_transaction_success" => %{
    :max => 61695,
    :mean => 21940.0,
    :min => 5600,
    :total_count => 20,
    {:percentile, 10.0} => 6463,
    {:percentile, 20.0} => 8383,
    {:percentile, 30.0} => 9151,
    {:percentile, 40.0} => 11647,
    {:percentile, 50.0} => 17535,
    {:percentile, 60.0} => 20095,
    {:percentile, 75.0} => 28031,
    {:percentile, 80.0} => 38911,
    {:percentile, 85.0} => 39423,
    {:percentile, 90.0} => 41215,
    {:percentile, 95.0} => 43519,
    {:percentile, 99.0} => 61695,
    {:percentile, 99.9} => 61695,
    {:percentile, 99.99} => 61695,
    {:percentile, 99.999} => 61695
  },
  "WatcherInfo.get_balances_success" => %{
    :max => 82943,
    :mean => 6706.828985507246,
    :min => 1488,
    :total_count => 345,
    {:percentile, 10.0} => 2431,
    {:percentile, 20.0} => 3423,
    {:percentile, 30.0} => 4895,
    {:percentile, 40.0} => 5823,
    {:percentile, 50.0} => 6271,
    {:percentile, 60.0} => 6687,
    {:percentile, 75.0} => 7583,
    {:percentile, 80.0} => 8447,
    {:percentile, 85.0} => 8895,
    {:percentile, 90.0} => 9727,
    {:percentile, 95.0} => 12927,
    {:percentile, 99.0} => 23423,
    {:percentile, 99.9} => 82943,
    {:percentile, 99.99} => 82943,
    {:percentile, 99.999} => 82943
  },
  "WatcherInfo.submit_typed_failure" => %{
    :max => 29439,
    :mean => 14578.666666666666,
    :min => 4256,
    :total_count => 18,
    {:percentile, 10.0} => 6847,
    {:percentile, 20.0} => 8511,
    {:percentile, 30.0} => 8639,
    {:percentile, 40.0} => 12543,
    {:percentile, 50.0} => 14399,
    {:percentile, 60.0} => 15551,
    {:percentile, 75.0} => 17919,
    {:percentile, 80.0} => 17919,
    {:percentile, 85.0} => 17919,
    {:percentile, 90.0} => 19967,
    {:percentile, 95.0} => 26879,
    {:percentile, 99.0} => 29439,
    {:percentile, 99.9} => 29439,
    {:percentile, 99.99} => 29439,
    {:percentile, 99.999} => 29439
  },
  "WatcherInfo.submit_typed_success" => %{
    :max => 29951,
    :mean => 12528.8,
    :min => 3568,
    :total_count => 20,
    {:percentile, 10.0} => 3967,
    {:percentile, 20.0} => 5759,
    {:percentile, 30.0} => 7999,
    {:percentile, 40.0} => 8447,
    {:percentile, 50.0} => 9407,
    {:percentile, 60.0} => 11519,
    {:percentile, 75.0} => 15615,
    {:percentile, 80.0} => 16383,
    {:percentile, 85.0} => 21631,
    {:percentile, 90.0} => 25727,
    {:percentile, 95.0} => 26879,
    {:percentile, 99.0} => 29951,
    {:percentile, 99.9} => 29951,
    {:percentile, 99.99} => 29951,
    {:percentile, 99.999} => 29951
  },
  "test_success" => %{
    :max => 61603839,
    :mean => 48037888.0,
    :min => 31457280,
    :total_count => 20,
    {:percentile, 10.0} => 31719423,
    {:percentile, 20.0} => 38273023,
    {:percentile, 30.0} => 38535167,
    {:percentile, 40.0} => 48758783,
    {:percentile, 50.0} => 48758783,
    {:percentile, 60.0} => 49020927,
    {:percentile, 75.0} => 60555263,
    {:percentile, 80.0} => 60555263,
    {:percentile, 85.0} => 60555263,
    {:percentile, 90.0} => 61079551,
    {:percentile, 95.0} => 61341695,
    {:percentile, 99.0} => 61603839,
    {:percentile, 99.9} => 61603839,
    {:percentile, 99.99} => 61603839,
    {:percentile, 99.999} => 61603839
  }
}

I created a separate module to collect metrics. It's possible to useChaperon.Session to do but it has to be passed to every function and returned from it wherever it's used, which will add complexity to the project. Also, it'll require re-writing most of the code in the perf project.

boolafish commented 4 years ago

A high level one, average time is usually not as useful as the P90 or P99. You can still have a very bad service that half of user get super high speed and another half get super slow while the average looks good.

I would recommend to implement P50 (your average), P90 and P99 for easier use.

ayrat555 commented 4 years ago

A high level one, average time is usually not as useful as the P90 or P99. You can still have a very bad service that half of user get super high speed and another half get super slow while the average looks good.

I would recommend to implement P50 (your average), P90 and P99 for easier use.

I started using histogrex (the same library used in Chaperon) to aggregate metrics. You can check examples in the PR's description

boolafish commented 4 years ago

{:percentile, 90.0} => 89599,

curious what is the unit?? I guess ms but that is still like 85 seconds......🤔 is it really correct? Every value looks so large 😬

InoMurko commented 4 years ago

yeah, great work on the examples! now we need to know what they mean :D we need a legend :D

ayrat555 commented 4 years ago

@boolafish values are in microseconds. I can change it to milliseconds @InoMurko it collects request metrics to Childchain and Watcherinfo apis. it has separate metrics for successful and failing requests

ayrat555 commented 4 years ago

https://github.com/omgnetwork/elixir-omg/pull/1765