kentik / ktranslate

System for pulling and pushing network data.
Apache License 2.0
56 stars 25 forks source link

Bug: If_admin and If_Oper status are inconsistent in the data #524

Closed Mesverrum closed 1 year ago

Mesverrum commented 1 year ago

I know that the interface polling has some hard coded aspects that do not quite sync up with the if-mib profile and I think there is a pain point there. We are running into a behavior where these two fields are being decorated onto all interface metrics but it is inconsistently alternating between displaying the current value and nulls. This flopping to the nulls causes the cardinality of each interface's metric stream to go increase by 4x which can be a problem for larger environments.

Also if we went with the way it sits in the profile then if_operStatus and if_adminStatus should be their own metrics, not a metric_tag on top of all the other if metrics.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Mesverrum commented 1 year ago

This is still a big issue for us on the NR side, and I think would be a factor for you guys with any TSDB that has cardinality constraints once you get into the tens of thousands of interfaces range.

As an example of what I'm seeing in the data across all accounts that use ktranslate is similar to this:

Timestamp Device Name If Interface Name If Admin Status If Oper Status March 06, 2023 10:50:09 AUS-WA01 apr0 up March 06, 2023 10:49:09 AUS-WA01 apr0 up up March 06, 2023 10:48:09 AUS-WA01 apr0
March 06, 2023 10:47:09 AUS-WA01 apr0 up up March 06, 2023 10:46:09 AUS-WA01 apr0 up

Based on how we have the profile defined admin and oper status are supposed to be their own metrics, but i believe its hard coded for them to get decorated into all if_mib stuff and it looks like something about that behavior is inconsistent.

tdanner commented 1 year ago

@kentik-will - could you take a look at this?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 5 days with no activity.