vergoh / vnstat

vnStat - a network traffic monitor for Linux and BSD
GNU General Public License v2.0
1.36k stars 120 forks source link

95th percentile bandwidth calculation #247

Closed g-v-egidy closed 5 months ago

g-v-egidy commented 1 year ago

Internet carriers often calculate the prices they charge based on the 95th percentile bandwidth. This method is explained for example here: https://www.semaphore.com/95th-percentile-bandwidth-metering-explained-and-analyzed/ https://en.wikipedia.org/wiki/Burstable_billing This method of billing is sometimes also used when renting dedicated servers or colocation.

Usually you make a contract and commit to a certain bandwidth. If you exceed that, you have to pay a premium. So you want to know your actual 95th percentile bandwidth used and also how far you are away from exceeding your commited bandwidth. It would be nice if vnstat could provide this information.

The sampling interval is most often 5 minutes and the bandwidth calculated over one month.

vnstat would have to store the 5%+1 of 5 minute intervals with the highest traffic in its database, that would be 4465 entries for a 31-day-month. Based on that you could output the actual 95th percentile bandwidth used, which is the bandwidth of the lowest 5 min interval in that data set.

What I think is also interesting is how far you are away from your committed bw. This value could be either stored in the config file or given on the command line when querying. The query would look at the dataset of the 5%+1 intervals with the highest traffic and check how many of them exceed the given committed bw. You could either output it as a raw number or in percent.

vergoh commented 1 year ago

Do you happen to have any actual examples of service providers using a 95th percentile bandwidth pricing model? I'd be interested on reading how they are phrasing that in their contracts and are there more differences in reality compared to that first analysis you've linked. For example, does the definition of "month" vary like not always starting on the 1st of every month or is the bandwidth sometime calculated only from the incoming traffic instead of rx+tx (something that's not even mentioned in that analysis).

vnStat can already be configured to store the 5 minute resolution data for longer periods. That combined with --json f should allow at least testing with the already possible data collection.

As for output, have you played with the --alert outputs introduced in version 2.9? I suspect the style of those may be what you are looking for apart from those supporting 95th percentile.

g-v-egidy commented 1 year ago

Do you happen to have any actual examples of service providers using a 95th percentile bandwidth pricing model? I'd be interested on reading how they are phrasing that in their contracts and are there more differences in reality compared to that first analysis you've linked.

I have many examples of providers using this method: https://23m.com/en/colocation https://www.fdcservers.net/budget-server-2 https://www.freethought.uk/ip-transit/ https://www.bsimplify.com/ip-transit/ https://blog.servermania.com/what-is-ip-transit/ https://ifog.ch/en/ip/ip-transit https://www.voxility.com/internet-access/IP+Transit+for+Networks (select "Change" at Network Access, then "Custom 'bandwidth' package")

Unfortunately they don't publish detailed contracts that explain precisely how the billing is done. Even the prices are usually only available from their sales team and not published.

The best I could find was this: https://www.nforce.com/legal (TOS - Annex IP Connectivity) But even this isn't detailed enough for directly implementing it.

Even when asking their sales representatives, you just get back the links I posted and an example screenshot from their billing tool. I think they don't consider it worthwile to explain the exact billing method to their lawyers and have them precisely spell it out in legalese. Either you accept this and live with it, our you are branded as fussy customer and your contract canceled...

For example, does the definition of "month" vary like not always starting on the 1st of every month or is the bandwidth sometime calculated only from the incoming traffic instead of rx+tx (something that's not even mentioned in that analysis).

From my experience there is no variation between the definition of "month" between providers. It is always the calendar month and starts with the 1st.

But you are correct regarding incoming and outgoing, this varies by provider. Some calculate the 95th percentile for rx and tx separately at the end of the month and only the higher one is charged. Others sum the incoming and outgoing traffic within each 5 minute interval and calculate the 95th percentile out of that.

I guess the rx/tx handling would have to be configurable.

vnStat can already be configured to store the 5 minute resolution data for longer periods. That combined with --json f should allow at least testing with the already possible data collection.

Yes, I'm aware of this and some script based on this will most probably be what I'm going to start with for now.

As for output, have you played with the --alert outputs introduced in version 2.9? I suspect the style of those may be what you are looking for apart from those supporting 95th percentile.

I have seen it but it doesn't fit my use case. I need vnstat to precisely take and store the measurements. The results are then regularly exported and interpreted by my monitoring system.

g-v-egidy commented 1 year ago

It seems that LibreNMS is used by several providers to calculate and monitor the billing data.

It has a module for 95th percentile calculation: https://docs.librenms.org/Extensions/Billing-Module/

The source could be used as reference how to implement this in detail: https://github.com/librenms/librenms/blob/master/poll-billing.php https://github.com/librenms/librenms/blob/master/billing-calculate.php

Also you could run LibreNMS alongside vnstat to verify a 95th percentile implementation yields the same results.

vergoh commented 1 year ago

As for output, have you played with the --alert outputs introduced in version 2.9? I suspect the style of those may be what you are looking for apart from those supporting 95th percentile.

I have seen it but it doesn't fit my use case. I need vnstat to precisely take and store the measurements. The results are then regularly exported and interpreted by my monitoring system.

Yes, I understand that. However, what I was trying to understand is how would you like vnStat to output the results / status. Are you for example using only the --json outputs (which would then need to support this) or from a textual output style point of view, is the way for example those --alert outputs are constructed more suitable?

I'd most likely also need to figure out what's the relevant information output during the time period and if there's some prediction that could also be made.

g-v-egidy commented 1 year ago

Yes, I understand that. However, what I was trying to understand is how would you like vnStat to output the results / status. Are you for example using only the --json outputs (which would then need to support this) or from a textual output style point of view, is the way for example those --alert outputs are constructed more suitable?

Oh, sorry, I think I misunderstood you there.

So for my usecase with the external monitoring system I would take the output from vnstat --json --95% --alert... and feed that to my monitoring system. Occasionally I might want to get the current data on the shell.

I think mixing the traffic volume based output, as shown currently by --alert and calling vnstat without parameters, with 95th percentile bandwidth would confuse users. So I suggest that you add special parameter, for example --95%, to switch the output to show only 95th percentile output.

This is how the output without paramater could look like:

# vnstat --95%

                  Bandwidth rx / Bandwidth tx  / Bandwidth total     
 br0:
       2023-05      9.57 MBit  /    2.37 MBit  /   11.94 MBit  

When using --alert with the current way how things in vnstat work you would be able to add your committed bandwidth as parameter and could get important information about it that would not be in the output shown above: how many 5 minute blocks did you already exceed your committed bandwidth. I consider this info essential and would like to have a way to get it as json as well as on the console.

# vnstat --95% --alert 1 0 m total 100 MBit br0

   br0 at 2023-05-16 13:45:00 for month 2023-05

         monthly |       rx          |      tx         |     total
       ----------+-------------------+-----------------+--------------
  bandwidth used |        9.57 MBit  |      2.37 MBit  |   11.94 MBit
           limit |      100.00 MBit  |    100.00 MBit  |  100.00 MBit
      over limit |           3 x5Min |         2 x5Min |       8 x5Min
 over limit left |        8925 x5Min |      8926 x5Min |    8920 x5Min

In both outputs I have rx and tx measured separately as well as rx+tx summed and the bandwidth calculated based on that. While your provider will usually only use one method of billing, you would need a per-interface configuration which way to calculate if you wanted to display only one way. I think it is easier for vnstat to always calculate and show both data sets.

I'd most likely also need to figure out what's the relevant information output during the time period and if there's some prediction that could also be made.

See above for what I consider relevant info.

The current bandwidth (95th percentile calculated against not the whole month, but from beginning of month to current time) should work quite well as prediction. What you could also do is calculate the number of 5 minute blocks that are over the committed bandwidth against how far you are in the month.

vergoh commented 6 months ago

Assuming 5MinuteHours has been at least 744 starting from the beginning of the month, the following output is now available:

 LAN (eth0)  /  95th percentile

 2023-12-01 00:00 - 2023-12-09 23:15 (2584 entries, 100% coverage)

                        rx       |       tx       |     total
       --------------------------+----------------+---------------
       minimum      15.14 kbit/s |    7.77 kbit/s |   23.28 kbit/s
       average       2.11 Mbit/s |  155.85 kbit/s |    2.26 Mbit/s
       maximum      63.75 Mbit/s |   17.29 Mbit/s |   65.22 Mbit/s
       --------------------------+----------------+---------------
        95th %       9.10 Mbit/s |  207.47 kbit/s |    9.37 Mbit/s

Otherwise there will be a warning of limited data being available and the coverage can't hit 100% (except during the beginning of the month). --95, --95% and --95th are supported as parameter for ease of use. Several things, like --json and --alert are still to be done.

g-v-egidy commented 6 months ago

Thank you very much for working on this!

I just tested the current git head (0a37b72) on my server. The data for 95th% matches exactly with what the billing tool of my provider outputs for the current month. So the algorithm you are using seems to be correct.

Since I've seen a commit regarding --json and --95% in the repo I have also tried that. Unfortunately I couldn't see any 95th_percentile or bandwidth entries in the json I got. But maybe I'm using it wrong or it is not finished yet?

vergoh commented 6 months ago

Thanks for confirming that the way the output is calculated matches what you were expecting.

--json isn't an output modifier in vnStat but a separate output mode instead. Whichever output mode is defined last on the command line gets displayed, which may result in some misunderstandings with something like --days compared to --days --json, where only the --json is actually relevant.

That 95th percentile json output got implemented but it's not the default (or included in the default) since it has the additional requirement of 5MinuteHours having a non-default configuration. --json p is what you are looking for. See --json ? for the other options if needed. Note that --json will show all available interfaces in the same output unless some specific interface is requested.

g-v-egidy commented 6 months ago

--json p is what you are looking for.

Thanks, that did the trick and I got the correct json output.

vergoh commented 6 months ago

Support for 95th percentile in --alert is now also in with https://github.com/vergoh/vnstat/commit/5f09d33e0765f0fdfc7a6c740df7909449648f35.

Let me know if you see any inconsistencies, oddities, things you feel could be improved or anything raising questions in any of the added outputs as these are far more convenient to change now than after the release. Obviously the "still to be written" documentation part will need to try to explain enough details.

Forgot to mention earlier, the --json output uses intentionally _bytes_per_second in the field names to avoid any misunderstandings what unit is being use. For bits per second, just multiply the value with 8.

g-v-egidy commented 6 months ago

Thanks for continuing to work on this. I have now tested your new code with the --alert option. I have used --alert 1 3 95% total 200 Mbit/s and it properly reported the difference to the alert limit or showed an alert on exceeding it.

what didn't work was using --alert 1 3 95% total_estimate 200 Mbit/s, it resulted in this:

    [...........................................................]
                        0 bit/s of 200 Mbit/s (0.0%)

I guess the estimation function doesn't work yet or I used it in a wrong way.

Also there is one information missing that I really would like to get from vnstat and use in my monitoring: how many 5-minute-intervals was the given limit (=my committed bandwidth) exceeded this month.

I would like to be able to see this info on the commandline as well as extract this info via json and feed it into my monitoring system. If this point, or my reasoning why I think this info is essential, isn't clear yet, then please ask and I can explain it in more detail.

vergoh commented 6 months ago

what didn't work was using --alert 1 3 95% total_estimate 200 Mbit/s, it resulted in this:

    [...........................................................]
                        0 bit/s of 200 Mbit/s (0.0%)

I guess the estimation function doesn't work yet or I used it in a wrong way.

That parameter combination wasn't actually supposed to work but I appear to have missed a check to block it from getting executed. I'm not also sure if there even can be separate estimates for 95th percentile as I don't think the result changes if the usage pattern stays the same over the month and changes in the usage pattern are things I can't predict anyway.

If you for example think of a constant 10 Mbit/s transfer rate all the time, the 95th percentile would be the same 10 Mbit/s at the beginning and end of the month so there's nothing to predict when measuring a transfer rate. Compared to how it is when the transferred amount is being tracked in which case from the average transfer rate it's possible to estimate how much would have been transferred at the end of the month if the transfer rate stays about the same.

Also there is one information missing that I really would like to get from vnstat and use in my monitoring: how many 5-minute-intervals was the given limit (=my committed bandwidth) exceeded this month.

I would like to be able to see this info on the commandline as well as extract this info via json and feed it into my monitoring system. If this point, or my reasoning why I think this info is essential, isn't clear yet, then please ask and I can explain it in more detail.

With some refactoring, it should be possible to get this information added to the --alert output. However, json is a different story since there's no limit being specified to begin with, none of the other json outputs would benefit from a limit parameter being added and the function handling the data processing is rather deep in the function call tree so everything in between would need to be changed to pass that limit information as there's no common data structure being passed forward.

Technically that information is already available for post-processing from the --json f output as all the 5 minute entries are listed so you could evaluate the information from there with the limit you are having.

vergoh commented 6 months ago

https://github.com/vergoh/vnstat/commit/7c5d6805dba60d8f53dadecdcaf84113ac8f126f now has the information regarding the number of 5 minute intervals exceeding the limit visible among some other minor adjustements.

I'll have to look if it would be easy enough to change the --json parameter to act as a output modifier for --alert. That would at least then solve passing the limit information forward, although it would then require having a json output version for all the other alert parameter combinations too.

vergoh commented 6 months ago

--alert + --json combination is now supported starting from https://github.com/vergoh/vnstat/commit/9cf4a0c290aae83d3b212015b16aad84313a20ed.

g-v-egidy commented 6 months ago

I just had time today to test your newest version. With the new --alert + --json combination mode I get all the data i need for my monitoring. thank you very much for implementing this.

I will adapt the hacky script i'm currently using with the --json + --alert code from vnstat soon.

vergoh commented 6 months ago

Image output support for 95th percentile is included starting from https://github.com/vergoh/vnstat/commit/0fdb228d492c8e2b74cdab171d2195f7d050a3a5.

image

g-v-egidy commented 5 months ago

this is really looking fancy now!

I haven't tried the graphics output yet because I use a different solution for that, but the json output is looking good so far and matches with what my provider measured.

vergoh commented 5 months ago

vnStat 2.12 has now been released with this feature included. Note that the documentation does indicate --95th as suggested command line parameter but both --95% and --95 are also supported.