perfsonar / project

The perfSONAR project's primary wiki and issue tracker.
Apache License 2.0
53 stars 10 forks source link

OWAMP bucket_width parameter with negative effect on the measurements #1388

Closed igarny closed 4 months ago

igarny commented 5 months ago

Hi guys,

I am having this awkward observation. When specifying the "bucket_width" parameter to the default value I see a distortion in the results of a powstream measurement

psmp-gn-mgmt-lon-uk:~$ pscheduler task latency --dest psmp-gn-owd-par-fr.geant.org --dest-node psmp-gn-mgmt-par-fr.geant.org --source psmp-gn-owd-lon-uk.geant.org --source-node psmp-gn-mgmt-lon-uk.geant.org --packet-count 100 --packet-interval 0.1 --packet-padding 0 --ip-version 4 --bucket-width 0.0001
Submitting task...
Task URL:
https://psmp-gn-mgmt-lon-uk.geant.org/pscheduler/tasks/a3089c1d-d3e4-409c-8021-c82491fd52e0
Running with tool 'owping'
Fetching first run...

Next scheduled run:
https://psmp-gn-mgmt-lon-uk.geant.org/pscheduler/tasks/a3089c1d-d3e4-409c-8021-c82491fd52e0/runs/54adacb5-aad7-4354-b97e-5bda68c4b38f
Starts 2024-06-12T08:56:59+00:00 (~3 seconds)
Ends   2024-06-12T08:57:21+00:00 (~21 seconds)
Waiting for result...

Packet Statistics
-----------------
Packets Sent ......... 100 packets
Packets Received ..... 100 packets
Packets Lost ......... 0 packets
Packets Duplicated ... 0 packets
Packets Reordered .... 0 packets

One-way Latency Statistics
--------------------------
Delay Median ......... 16.88 ms
Delay Minimum ........ 16.10 ms
Delay Maximum ........ 17.39 ms
Delay Mean ........... 16.88 ms
Delay Mode ........... 16.88 ms 16.89 ms
Delay 25th Percentile ... 16.75 ms
Delay 75th Percentile ... 17.16 ms
Delay 95th Percentile ... 17.32 ms
Max Clock Error ...... 3.04 ms
Common Jitter Measurements:
    P95 - P50 ........ 0.44 ms
    P75 - P25 ........ 0.41 ms
    Variance ......... 0.07 ms
    Std Deviation .... 0.27 ms
Histogram:
    16.10 ms: 1 packets
    16.39 ms: 2 packets
    16.44 ms: 2 packets
    16.45 ms: 1 packets
    16.46 ms: 1 packets
    16.48 ms: 1 packets
    16.49 ms: 1 packets
    16.51 ms: 1 packets
    16.52 ms: 2 packets
    16.53 ms: 4 packets
    16.54 ms: 2 packets
    16.55 ms: 1 packets
    16.56 ms: 2 packets
    16.58 ms: 1 packets
    16.68 ms: 1 packets
    16.72 ms: 1 packets
    16.75 ms: 2 packets
    16.76 ms: 2 packets
    16.77 ms: 4 packets
    16.78 ms: 1 packets
    16.81 ms: 2 packets
    16.82 ms: 1 packets
    16.83 ms: 1 packets
    16.85 ms: 1 packets
    16.86 ms: 4 packets
    16.87 ms: 3 packets
    16.88 ms: 9 packets
    16.89 ms: 9 packets
    16.91 ms: 3 packets
    16.92 ms: 1 packets
    16.93 ms: 1 packets
    16.97 ms: 2 packets
    17.06 ms: 1 packets
    17.09 ms: 1 packets
    17.12 ms: 1 packets
    17.14 ms: 2 packets
    17.17 ms: 1 packets
    17.18 ms: 3 packets
    17.19 ms: 1 packets
    17.20 ms: 5 packets
    17.21 ms: 1 packets
    17.22 ms: 5 packets
    17.24 ms: 3 packets
    17.27 ms: 1 packets
    17.32 ms: 2 packets
    17.35 ms: 1 packets
    17.37 ms: 1 packets
    17.39 ms: 1 packets

TTL Statistics
--------------
TTL Median ........... 252.00
TTL Minimum .......... 252.00
TTL Maximum .......... 252.00
TTL Mean ............. 252.00
TTL Mode ............. 252.00
TTL 25th Percentile ... 252.00
TTL 75th Percentile ... 252.00
TTL 95th Percentile ... 252.00
Histogram:
    252: 100 packets

If I remove the parameter and works, just fine, despite the fact, that the default value = the removed parameter is:


Without bucket-width

psmp-gn-mgmt-lon-uk:~$ pscheduler task latency --dest psmp-gn-owd-par-fr.geant.org --dest-node psmp-gn-mgmt-par-fr.geant.org --source psmp-gn-owd-lon-uk.geant.org --source-node psmp-gn-mgmt-lon-uk.geant.org --packet-count 100 --packet-interval 0.1 --packet-padding 0 --ip-version 4
Submitting task...
Task URL:
https://psmp-gn-mgmt-lon-uk.geant.org/pscheduler/tasks/039cf1a6-8bdc-40b1-a728-2058d45ff925
Running with tool 'owping'
Fetching first run...

Next scheduled run:
https://psmp-gn-mgmt-lon-uk.geant.org/pscheduler/tasks/039cf1a6-8bdc-40b1-a728-2058d45ff925/runs/1905a933-85a3-4c80-843c-de505a0bb55c
Starts 2024-06-12T09:17:40+00:00 (~2 seconds)
Ends   2024-06-12T09:18:02+00:00 (~21 seconds)
Waiting for result...

Packet Statistics
-----------------
Packets Sent ......... 100 packets
Packets Received ..... 100 packets
Packets Lost ......... 0 packets
Packets Duplicated ... 0 packets
Packets Reordered .... 0 packets

One-way Latency Statistics
--------------------------
Delay Median ......... 1.66 ms
Delay Minimum ........ 1.60 ms
Delay Maximum ........ 1.69 ms
Delay Mean ........... 1.66 ms
Delay Mode ........... 1.68 ms
Delay 25th Percentile ... 1.65 ms
Delay 75th Percentile ... 1.68 ms
Delay 95th Percentile ... 1.68 ms
Max Clock Error ...... 3.04 ms
Common Jitter Measurements:
    P95 - P50 ........ 0.02 ms
    P75 - P25 ........ 0.03 ms
    Variance ......... 0.00 ms
    Std Deviation .... 0.02 ms
Histogram:
    1.60 ms: 1 packets
    1.61 ms: 2 packets
    1.62 ms: 11 packets
    1.63 ms: 1 packets
    1.64 ms: 9 packets
    1.65 ms: 2 packets
    1.66 ms: 28 packets
    1.67 ms: 11 packets
    1.68 ms: 33 packets
    1.69 ms: 2 packets

TTL Statistics
--------------
TTL Median ........... 252.00
TTL Minimum .......... 252.00
TTL Maximum .......... 252.00
TTL Mean ............. 252.00
TTL Mode ............. 252.00
TTL 25th Percentile ... 252.00
TTL 75th Percentile ... 252.00
TTL 95th Percentile ... 252.00
Histogram:
    252: 100 packets

No further runs scheduled.

One other observation of the awkwardness is that MaDDash somehow compensates, but Grafana doesn't. Meaning despite the skewed results... MaDDash somehow recognizes the issue parameter lack/existence and provides the correct results Below you'll see despite the change in the depth parameter MaDDash doesn't recognize the issue

image

laeti-tia commented 4 months ago

Isn't there some calculation happening in MaDDash? And/or a need to have the packet-intertval and bucket-width parameters aligned somehow? The difference between the first and second results seems to be a 10-fold decrease.

igarny commented 4 months ago

Here are the final tests: testBAD.txt resBAD.json testOK.txt resultOK.json

arlake228 commented 4 months ago

As suspected this is a display issue. I was able to recreate in my testbed. The problem was both in CLI and Grafana as neither was accounting for bucket-width when working with histogram or derived values. The histogram buckets are stored in Opensearch with the bins sized according to the bucket width (default .001) which is what is desired. Grafana and the CLI were just slapping a ms label on whatever came out without consideration for bucket width. I have updated both to look for the bucket width and scale values accordingly so they are always normalized to milliseconds.