Closed aned closed 1 month ago
if e.values.get(local_system_name_path) and e.values.get(local_port_id_path) and e.values.get(remote_system_name_path) and e.values.get(remote_port_id_path):
Are you expecting the event has all these values in the same event?
On a Juniper QFX5130 with this set of subscription paths each item seems to come as its own event?
When you says it 'breaks' /metrics can you provide more information?
You can add some tracing in your starlark which can output on the console for assistance in debugging?
ie
print(e)
I'm not sure I have tried e.clone() to create a new event, but I have tried something like
new_event = Event("enhanced_interface_counters", e.timestamp, e.tags)
This worked for me ... you might need to rework it but the idea being you cache the stuff you need until you have all of it (because the neighbor port-id and neighbor system-name are in different events) before you create your new event
logging = True
cache = {}
def apply(*events):
# events we return
ret_events = []
for e in events:
if e.name == 'lldp':
if logging:
print(e)
# remove port part of target
hostname = e.tags['source'].replace( ':9339', '' )
if e.tags.get('neighbor_id'):
cache_key = hostname + '_' + e.tags['interface_name'] + '_' + e.tags['neighbor_id']
if cache_key in cache:
if logging:
print( 'cache hit for ' + cache_key )
else:
if logging:
print( '** new cache slot for ' + cache_key )
cache[cache_key] = {}
if e.values.get( '/lldp/interfaces/interface/neighbors/neighbor/state/port-id' ):
cache[cache_key]['remote-port'] = e.values.get( '/lldp/interfaces/interface/neighbors/neighbor/state/port-id' )
if e.values.get( '/lldp/interfaces/interface/neighbors/neighbor/state/system-name' ):
cache[cache_key]['remote-name'] = e.values.get( '/lldp/interfaces/interface/neighbors/neighbor/state/system-name' )
cache[cache_key]['local-name'] = hostname
cache[cache_key]['local-port'] = e.tags['interface_name']
if 'remote-name' in cache[cache_key] and 'remote-port' in cache[cache_key]:
new_event = Event("lldp_inventory", e.timestamp )
for key,value in cache[cache_key].items():
new_event.tags[key] = value
new_event.values = { 'lldp_item' : 1 }
ret_events.append( new_event )
if logging:
print( new_event )
else:
ret_events.append( e )
return ret_events
gives me
# HELP lldp_item gNMIc generated metric
# TYPE lldp_item untyped
lldp_item{local_name="bl01.laba.lab",local_port="et-0/0/0",remote_name="sp01.01.laba",remote_port="et-0/0/28"} 1
lldp_item{local_name="bl01.laba.lab",local_port="et-0/0/1",remote_name="sp02.01.laba",remote_port="et-0/0/28"} 1
lldp_item{local_name="bl01.laba.lab",local_port="et-0/0/28:0",remote_name="dr03.laba",remote_port="et-2/2/3"} 1
lldp_item{local_name="bl01.laba.lab",local_port="et-0/0/28:2",remote_name="dr04.laba",remote_port="et-2/2/3"} 1
lldp_item{local_name="bl01.laba.lab",local_port="re0:mgmt-0",remote_name="rsw01.r1.05.laba",remote_port="ge-0/0/14"} 1
Thanks @peejaychilds for looking into it. Got your point about regarding caching the stuff I need until I have all of it for processing.
Trying to troubleshoot (you script doesn't break things but doesn't return lldp_item), where does it send the logs? I'm running it in cli ./gnmic --config gnmic-config.yaml subscribe --tls-cert ./1.pem --tls-key ./1-key.pem --skip-verify -d
but it's a bit too much gibberish and I don't see any logs from the script itself.
I believe it is the log: true
option that makes output to stdout
My gnmic.yaml has
skip-verify: true
encoding: json
log: true
loader:
type: file
# path to the file
path: /opt/gnmi/targets/targets.yaml
enable-metrics: true
subscriptions:
lldp:
paths:
- /lldp/interfaces/interface/neighbors/neighbor/state/system-name
- /lldp/interfaces/interface/neighbors/neighbor/state/port-id
stream-mode: sample
sample-interval: 60s
outputs:
prometheus:
type: prometheus
listen: :8080
append-subscription-name: false
strings-as-labels: true
timeout: 30s
event-processors:
- proc-test
processors:
proc-test:
event-starlark:
script: /opt/gnmi/test.star
I run gnmic --config /opt/gnmi/gnmic.yaml subscribe
and see things like
...
2024/07/30 00:09:13.940595 [event-starlark] print(): {"name":"lldp","timestamp":1722298153765731172,"tags":{"interface_name":"re0:mgmt-0","neighbor_id":"50:c7:09:a7:c3:f1-ge-0/0/17","source":"sp01.01.laba.lab:9339","subscription-name":"lldp"},"values":{"/lldp/interfaces/interface/neighbors/neighbor/state/port-id":"ge-0/0/17"}}
2024/07/30 00:09:13.940620 [event-starlark] print(): cache hit for sp01.01.laba.lab_re0:mgmt-0_50:c7:09:a7:c3:f1-ge-0/0/17
2024/07/30 00:09:13.940664 [event-starlark] print(): {"name":"lldp_inventory","timestamp":1722298153765731172,"tags":{"local-name":"sp01.01.laba.lab","local-port":"re0:mgmt-0","remote-name":"rsw01.r1.05.laba","remote-port":"ge-0/0/17"},"values":{"lldp_item":1}}
...
Thanks, now I have something to work with!
This seems to do the trick:
starlark-add-lldp-info-2:
event-starlark:
source: |
logging = True
cache = {}
def apply(*events):
# events we return
ret_events = []
for e in events:
if e.name == 'lldp':
if logging:
print(e)
# remove port part of target
hostname = e.tags['source'].replace(':6030', '')
if e.tags.get('neighbor_id'):
cache_key = hostname + '_' + e.tags['interface_name'] + '_' + e.tags['neighbor_id']
if cache_key in cache:
if logging:
print('cache hit for ' + cache_key)
else:
if logging:
print('** new cache slot for ' + cache_key)
cache[cache_key] = {}
if e.values.get('/lldp/interfaces/interface/neighbors/neighbor/state/port-id'):
cache[cache_key]['remote-port'] = e.values.get('/lldp/interfaces/interface/neighbors/neighbor/state/port-id')
if e.values.get('/lldp/interfaces/interface/neighbors/neighbor/state/system-name'):
cache[cache_key]['remote-name'] = e.values.get('/lldp/interfaces/interface/neighbors/neighbor/state/system-name')
cache[cache_key]['local-name'] = hostname
cache[cache_key]['local-port'] = e.tags['interface_name']
if 'remote-name' in cache[cache_key] and 'remote-port' in cache[cache_key]:
local_name = cache[cache_key]['local-name']
local_port = cache[cache_key]['local-port']
remote_name = cache[cache_key]['remote-name']
remote_port = cache[cache_key]['remote-port']
# Sort names to ensure consistent lldpName
if local_name < remote_name:
lldp_name = local_name + ":" + local_port + ":" + remote_name + ":" + remote_port
else:
lldp_name = remote_name + ":" + remote_port + ":" + local_name + ":" + local_port
# Create a new event manually
new_event = Event("lldp_inventory", e.timestamp)
new_event.values = {'lldp_local_mapping': 1}
new_event.tags = {}
new_event.tags.update(cache[cache_key])
new_event.tags['lldpName'] = lldp_name
ret_events.append(new_event)
if logging:
print(new_event)
else:
ret_events.append(e)
return ret_events
On a side note, is there an efficient way in gnmic (starlark or some other existing processor) to change metric value based on label value? For example, with the default interface status, not much one can do
interface_state_oper_status{alias="blah.com",ifAlias="SomeName",ifName="Ethernet1/1",oper_status="NOT_PRESENT"} 1
interface_state_oper_status{alias="blah.com",ifAlias="SomeOtherName",ifName="Ethernet2/1",oper_status="UP"} 1
I need to convert it to
interface_state_oper_status{alias="blah.com",ifAlias="SomeName",ifName="Ethernet1/1",oper_status="NOT_PRESENT"} 0
interface_state_oper_status{alias="blah.com",ifAlias="SomeOtherName",ifName="Ethernet2/1",oper_status="UP"} 1
(based on oper_status label value)
On a side note, is there an efficient way in gnmic (starlark or some other existing processor) to change metric value based on label value? For example, with the default interface status, not much one can do interface_state_oper_status{alias="blah.com",ifAlias="SomeName",ifName="Ethernet1/1",oper_status="NOT_PRESENT"} 1 interface_state_oper_status{alias="blah.com",ifAlias="SomeOtherName",ifName="Ethernet2/1",oper_status="UP"} 1 I need to convert it to interface_state_oper_status{alias="blah.com",ifAlias="SomeName",ifName="Ethernet1/1",oper_status="NOT_PRESENT"} 0 interface_state_oper_status{alias="blah.com",ifAlias="SomeOtherName",ifName="Ethernet2/1",oper_status="UP"} 1 (based on oper_status label value)
You should just map the oper_status
value to a numerical value, instead of making it a label.
processors:
# processor name
map-oper-status:
# processor type
event-strings:
value-names:
- "/interface/state/oper-status"
transforms:
# strings function name
- replace:
apply-on: "value"
old: "NOT_PRESENT"
new: "0"
- replace:
apply-on: "value"
old: "UP"
new: "1"
Pretty neat, thanks!
I'm probably doing something wrong in this script, it's not a bug with the implementation. This is the config I'm trying to use, currently it just breaks prometheus's /metrics
The idea in a nutshell is to create a new metric in a format
lldpName needs to be sorted by name on creation, so it's the same on both devices: local and remote. This lldpName label will be used to correlate device/port on both sides for alerts, etc.
Any pointers to what's wrong with the script above?