lifenjoiner / ewma

An EWMA Variant
MIT License
0 stars 0 forks source link

EWMA Comparisons #1

Open lifenjoiner opened 3 years ago

lifenjoiner commented 3 years ago

Here is an easy to use helper to give you a glance over different EWMA strategies: EWMA_cmp.xlsx

Just refresh the random samples or change the N :)

lifenjoiner commented 2 years ago

EWMA_cmp_plot.xlsx

ewma-cmp.gif

Explanation:

  1. Image the moving window selecting the data samples.
  2. Average in column I is the exact average of the selected samples. It is the ideal result, the base that others will compare to.
  3. N in cell A2 is the moving window size.
  4. Alpha in cell B2 is the solid decay, the W of EWMA.
  5. n in column C is the index of samples.
  6. V in column D is the random value as samples.
  7. alpha in column E is the just-in-time decay, the W of EWMA.
  8. EWMA Variant is the result of adding the the 1st sample by using Add.
  9. EWMA Continuing is the result of adding the the 1st sample by using Set.
  10. EWMA Warmup is the result of the VividCortex implement with WARMUP_SAMPLES = N - 1, and adding the the 1st sample by using Add.

Comparing to VividCortex implement: EWMA Continuing should be the same, while they both add the the 1st sample by using Set. EWMA Warmup should be the same, while they both use the moving window size equal to WARMUP_SAMPLES.

More discussions: https://github.com/DNSCrypt/dnscrypt-proxy/pull/2079

lifenjoiner commented 2 years ago

2 strategies at the "warmup" stage related to how to deal with the outliers:

  1. Add the 1st sample by using Add. The following samples have more (decreasing to stable) weight.
  2. Add the 1st sample by using Set. The following samples have less weight.

Things you may need to consider:

  1. Can you tell which one is outlier, the 1st, the 2nd, or the 3rd? If none, the nearer to exact average the better. If the 1st is, the average should be adjusted quicker/heavier. If one of the following is, the average should be adjusted slower/lighter.
  2. What is the outlier ratio of a server? If it is high, the server is less reliable, adjusting quicker is better, the stable one will win at last. If all are low, they are reliable, the nearer to exact average the better.

Anyway, the above is a rough/lazy way. If you really care about outliers, you should deal with them in an earlier stage: the validate/cleanup stage, for all samples.