nmstate / kubernetes-nmstate

Declarative node network configuration driven through Kubernetes API.
GNU General Public License v2.0
172 stars 86 forks source link

[RFE] Collect nmstate usage data for telemetry #1202

Open cathay4t opened 10 months ago

cathay4t commented 10 months ago

We are hoping to get data on

This could help us on planning CI coverage and backport patches.

qinqon commented 10 months ago

@cathay4t about first bullet we can generalize this by counters per nested interfaces ? like vlan(linux(bond)) -> 3 linux(vlan) -> 4 ovs(vlan) -> 5 something like this ?

cathay4t commented 10 months ago

Initially, I would like to collect date for:

  1. Topology: array of free strings, like vlan, linux-bridge, vlan-over-bridge-over-bond, ovs-bridge etc.
  2. Features, array of free strings, like mac-based-identifier, ovn-mapping. With this we could know adoption rate of implemented features.
  3. Use cases. array of free strings. like move ip from eth to bridge, change dns nameservers, switch from dynamic ip to static. We can use nmstate tier1 test case name for naming these use cases, so we have clear definition.
  4. Optionally, learn use case of customer's desire state. Till suggested this, but I don't know how to do it.

Each cluster only count as one for topology/feature/use case regardless how many interfaces it has or how many NNCP it has. So our data does not imfluenced by a big cluster with 1000+ VLANs or NNCPs.

I will create demo like nmstatectl gen-statistics <desire_state_file> [-c <current_state_file>]:

topologies:
  - vlan
  - linux-bridge-over-bond-over-sriov-vf
features:
 - nm-global-dns
 - mac-identifier
 - sriov-vf-reference
 - ovn-mapping
use-cases:
 - edit_static_ipv4_address_and_prefix
 - disable_static_ipv6
cybertron commented 9 months ago

Just a note that when I've had conversations about telemetry in the past, I was told the amount of data we're allowed to send for it is extremely limited. Like instead of JSON booleans, we have to use bitmasks where each bit is mapped to a given key. I haven't actually confirmed that myself, but before we come up with a complex set of values that we want to return we may want to confirm that we can actually represent that in the amount of data we're allowed to send.

qinqon commented 8 months ago

@cybertron @cathay4t this is poc to integrate some statistics, from there we can reduce optimize what we need

https://github.com/nmstate/kubernetes-nmstate/pull/1210

Looks like the top 10 can be filter at prometheus using topk(10, sum(nmstate_apply_topology_total) by (name))