influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.52k stars 5.56k forks source link

SNMP trap collection Telegraf #4377

Closed ErantD closed 4 years ago

ErantD commented 6 years ago

Feature Request

Opening a feature request kicks off a discussion.

Proposal:

Have telegraf receive passive snmp traps and send them to influxdb.

Current behavior:

Currently telegraf can get active snmp, via queries to a host, but host send snmp are not collectable with telegraf. Example would be an snmp trap sent because of a failed power supply, telegraf does not collect the trap and therefore is not able to forward it to influxdb.

Desired behavior:

Telegraf to receive and forward snmp traps from hosts to influxdb. ie. 1.) host send traps to telegraf because of a power supply failure. 2.) telegraf receives trap, translates it to influxdb line language using vendor MIB 3.) sends proper influxdb input to influxdb. 4.) to make it perfect, Kapacitor shows alert.

Use case: [Why is this important (helps with prioritizing requests)]

We are monitoring lots of systems, and we want to centralize all monitoring to one monitoring system. To make this complete with influxdb, grafana and kapacitor, we are missing the snmp trap from hosts, like PSU failover, disk failure, node reboot, etc..

dmitrysdm commented 6 years ago

Now i set by follow scheme: Logstash -> logfile -> Telegraf Logparcer And also wait for native realization.

endersonmaia commented 6 years ago

the gosnmp that's used by telegraf already has a trap listening implementation.

see: https://github.com/soniah/gosnmp/blob/master/trap.go

maybe it's the way to go

since i don't know Golang, I was thinking on recurring to snmptrapd and snmptt sending logs to telegraf syslog input plugin, but that's a lot of moving parts

skipper00 commented 6 years ago

+1

candlerb commented 5 years ago

This would nicely complement the syslog receiver which Telegraf 1.7 gained, especially if it could be used with Chronograf's integrated log viewer.

An issue here is how to break up the information contained within SNMP trap records. Maybe these should be dealt with the same as syslog structured data fields, creating dynamic timeseries (columns) as required.

For example, a linkDown trap might arrive like this in tcpdump:

    127.0.0.1.57932 > 127.0.0.1.162:  { SNMPv2c { V2Trap(108) R=97760785  .1.3.6.1.2.1.1.3.0=20587570 .1.3.6.1.6.3.1.1.4.1.0=.1.3.6.1.6.3.1.1.5.3 .1.3.6.1.2.1.2.2.1.2="eth0" .1.3.6.1.2.1.2.2.1.7=1 .1.3.6.1.2.1.2.2.1.8=1 } }

which could be decoded to the following JSON:

{
  "sysUpTimeInstance": 20587570,
  "snmpTrapOID.0": "linkDown",
  "ifDescr": "eth0",
  "ifAdminStatus": "up",
  "ifOperStatus": "up"
}

Not all of this information is necessarily important; but if it contains ifIndex or ifDescr, you'll need it to identify the interface this event relates to.

discoduck2x commented 5 years ago

yes this would be nice , like now one has to go prtg datasource and grafana instead of just native into influx

URZ-HD commented 5 years ago

+1

iiidddaaannn102 commented 5 years ago

Hi...this is so easy request ..you can develop it?? Its so usefull and will help a lot!!!

camden76 commented 5 years ago

Getting our enterprise acceptance of using Telegraf/Influx to replace a 'legacy' solution has this as a pre-requisite. Is there anything that we can do to help this along?

bruceschaller commented 5 years ago

I think I'll try snmptrapd to syslog. Syslog already has a rich input plugin, and in this way, I will get the full power of snmptrapd to manage what the record will look like before I drop it into the db. At first, I was thinking of using snmptrapd to dump into an influx data formatted file, however, I'm worried about escape characters breaking this, or having unintended consequences. It would be nice to have a direct trap input, but I understand why this has not been implemented.

danielnelson commented 5 years ago

Question for everyone: are you interested only in TRAP support or would INFORM also be useful? If you are interested in INFORM support is it a need or a want?

@bruceschaller Would love to hear more about your setup once you have it working, it sounds like a pretty good idea to me.

VMan228 commented 5 years ago

For what it’s worth trap support the need inform would be useful, but if we could get trap working without the use of syslog and parsing I’d be super interested!

js-mode commented 5 years ago

I personally would love inform support. Right now we have traps that are being sent to a docker container that is running syslog-ng and snmptrapd - but not able to get it to work with V3. Biggest issue has been the engine ID and configuring it to be apart of our automation system.

After doing a lot of research, we have been able to figure out how to get our traps formatted for Slack and Wavefront but would be nice to have it integrated with our Telegraf SNMP monitoring solution as well.

If there are questions or details I can provide to make this happen please let me know.

On Thu, May 23, 2019, 8:13 PM Daniel Nelson notifications@github.com wrote:

Question for everyone: are you interested only in TRAP support or would INFORM also be useful? If you are interested in INFORM support is it a need or a want?

@bruceschaller https://github.com/bruceschaller Would love to hear more about your setup once you have it working, it sounds like a pretty good idea to me.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/influxdata/telegraf/issues/4377?email_source=notifications&email_token=AHEEGKHVISY7CGCUKTLHUB3PW5MNFA5CNFSM4FIJRVUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWEBGUY#issuecomment-495457107, or mute the thread https://github.com/notifications/unsubscribe-auth/AHEEGKEXGE3NJVUQOZGCHMLPW5MNFANCNFSM4FIJRVUA .

endersonmaia commented 5 years ago

@danielnelson TRAP and INFORM are a need

arkhitekton commented 5 years ago

This link ( https://www.influxdata.com/what-is-snmp/ ) indicates that the "Telegraf SNMP Input Plugin" supports receiving traps:

Telegraf can be deployed with SNMP Input Plugin configured to fetch or listen to specific OIDs. The plugin contains a SNMP traps receiver, which fetches traps from the managed device. Then, Telegraf batches the resulting data and streams it to InfluxDB.

Is that because the feature got developed already? Or is it a different component than being discussed here?

Is there any update on this? We also are being asked to get rid of COTS and this feature is the only large gap for a TICK stack versus our current stuff.

Thanks!

danielnelson commented 5 years ago

Thanks for pointing out this page, unfortunately it is not accurate and currently there is no TRAP support in Telegraf. However, it is something we are working on.

I'll have someone correct that link to avoid confusion.

superdave commented 5 years ago

Glad to see this is active! I was actually just looking to see if it had already been done. I'm a professional Go developer who's interested in helping; is there anything public available I could contribute to/collaborate on?

danielnelson commented 5 years ago

@superdave Do you think you could investigate adding support for INFORM to gosnmp? From a cursory look at the library I don't think it is supported currently. If we can receive them and acknowledge in a similar way to TRAP, it would be very helpful.

superdave commented 5 years ago

I could look! I don't know if I have any current devices that use it, but I imagine I can get one of the various SNMP daemons out there to throw one out.

superdave commented 5 years ago

Sorry for the delay on this, but I'm starting to look into this tonight. Looks like net-snmp has INFORM functionality, so it should be easy enough to test from there, and someone's contacted me in re: running tests with real devices that have it (all my SNMP devices are old-ish and I haven't seen much in the way of INFORM functionality in their docs, so).

superdave commented 5 years ago

VERY preliminary, definitely not ready for merge yet, but it looks like INFORM is actually pretty simple; it's just a TRAP that expects a pretty-much identical response. This works so far with snmptrap -v 2c -c public -Ci <host> 23 coldStart.0: https://github.com/soniah/gosnmp/pull/202

js-mode commented 5 years ago

Hi @superdave -- just wanted to check on any update regarding the snmptraps with Telegraf. Is there something available that maybe I can test out? would this be a separate input plugin for telegraf? wanted to see if there is anything I can do to help.

superdave commented 5 years ago

I have next to no information on the Telegraf implementation, I've just gotten some initial INFORM support added to the go-snmp package in a PR. I haven't had time to add proper testing or examine exact compliance with the spec (particularly the response I send), though it seems to work with net-snmp. Feel free to test that out with your devices! As soon as I have a chance to get back to it, I'll try and wrap up my PR and see if the maintainer will merge it.

danielnelson commented 4 years ago

We just merged a pull request that adds an initial snmp trap input plugin, we'd love to get some early testing on this plugin and feedback on experience and if it meets your requirements.

Unfortunately, @superdave's INFORM support is not added upstream yet, so for now this can handle unconfirmed traps only.

Plugin documentation: snmp_trap

superdave commented 4 years ago

Sadly, I haven't even heard initial feedback about my PR there yet, though in all honesty it needs tests added. I'll try to get those in soon and see if it gets the attention of the maintainer. I'm very excited to try out the regular trap functionality, though! That's about 95% of my use cases.

danielnelson commented 4 years ago

Everyone: now that the new 1.13.0-rc1 build is out, please switch over to these packages for testing.

@superdave I'll review your PR on gosnmp, I'm no expert on SNMP but maybe that will help it get some motion.

superdave commented 4 years ago

Right, it's obviously not ready for prime time yet, but I figured a WIP PR was better than nothing.

superdave commented 4 years ago

Thanks for the review, of course! Good catches, I'll implement those shortly and hopefully get some better testing in.

AlexTargo commented 4 years ago

I have done test with the plugin since the beginning of the week and it work very well. The only problem that i have is very specific to me and i'm not sur if it's an issue with the SNMP trap plugin or something else.

When the trap is send, there is HEX part in the trap that is transform in text when written to the DB. I'm not sure in witch part off the process it is done. But I say that because at the reception off the trap, in snmptrapd, i can see the raw information that i need like this: Dec 6 20:59:23 SNMPTraps snmptrapd[6123]: 2019-12-06 20:59:23 [UDP: [172.17.1.2]:60493->[10.5.2.70]:162]:#012iso.3.6.1.2.1.1.3.0 = Timeticks: (36834154) 4 days, 6:19:01.54#011iso.3.6.1.6.3.1.1.4.1.0 = OID: iso.3.6.1.4.1.8886.18.3.1.4.1#011iso.3.6.1.4.1.8886.18.3.1.3.1.1.1.293666817 = INTEGER: 293666817#011iso.3.6.1.4.1.8886.18.3.1.3.1.1.2.293666817 = Hex-STRING: 52 43 4D 47 18 A0 16 7D #011iso.3.6.1.4.1.8886.18.3.1.3.1.1.3.293666817 = ""#011iso.3.6.1.4.1.8886.18.3.1.3.1.1.4.293666817 = ""#011iso.3.6.1.4.1.8886.18.3.1.3.1.1.5.293666817 = ""

In InluxDB it look like this: time last_rcGponSystem.3.1.1.2.293666817

1575650812964027095 RCMG�}

I know that the HEX string is not a real hex. But we have to deall with this stuff... (It's cheaper) We would need it not to be altered and be written as raw information in the db.

Is anybody could help point me were is the right place to post my problem?

Thanks

danielnelson commented 4 years ago

@AlexTargo Thanks for the feedback and testing, let's open this up as a new feature issue.

It could be helpful to add a packet capture to the issue as well. Could you run tcpdump in the background while receiving a trap with a similar command line and include it in the issue:

sudo tcpdump -s 0 -i eth0 -w snmptrap.pcap port 162

Also include the output of Telegraf, (you can ctrl-c it after the trap is received and the output is printed):

telegraf --input-filter=snmp_trap --test --test-wait=600
tesibelda commented 4 years ago

One issue I came across while testing is that some traps do not contain variables in its definition, so telegraf won't treat them. This is the case for the classic coldStart or ucdShutdown. You may reproduce this by sending a coldstart trap: snmptrap -v 1 -c public 127.0.0.1 coldStart 1 0 '' you will not see it in the output of this input. Maybe a dummy field could be use for these cases.

danielnelson commented 4 years ago

@tesibelda Thanks for the report, it looks like any v1 traps without variables won't be collected since there are no produced fields. To fix, I think we should convert the "time-stamp" from the Trap-PDU into the sysUpTime parameter, along the lines as described in rfc2576. This will guarantee us a field and improve the compatibly between v1 and v2 traps.

danielnelson commented 4 years ago

@tesibelda We added support for v1 traps in #6786, hoping to do a final release of 1.13.0 tomorrow but if you have time and could take a look at 1.13.0-rc3 it would be really appreciated.

tesibelda commented 4 years ago

@danielnelson Sorry for the delay. The idea of using sysUpTimeInstance field is great. Also including the timeout option since during my tests it took more than 5s when using several thousand files in the MIBDIR. For my tests the 1.13 release works just as expected. Great job!

danielnelson commented 4 years ago

It would be nice to improve the performance of the caching. I'm curious what the approximate size of your MIBs is, could you run:

wc /usr/share/snmp/mibs/* | grep total
tesibelda commented 4 years ago

I tested it in a Windows laptop, with different number of files in \usr\share\snmp\mibs folder. There were timeouts with 11919 MIB files (from http://mibs.snmplabs.com/asn1/), I have also tried with 4622 MIB files even with 30s timeout configured. No timeout when using 322 MIB files. After the first time, cache works fine. Using netsnmp 5.7 Windows binaries and axNetworkTrunkPortsThreshold trap from A10. I guess this is more a performance issue of snmptranslate. I do not have more recent Windows binaries, but I will try it on Linux with 5.8 binaries.

danielnelson commented 4 years ago

We are mostly trying to gauge what the requirements are for load performance at this point, is this your normal loadout of MIBs? Would also be nice to do a simple snmptranslate with the MIBs for comparison. We have an existing issue, #5720, for MIB parsing performance, can you respond on that issue?

neeles83 commented 4 years ago

Question for everyone: are you interested only in TRAP support or would INFORM also be useful? If you are interested in INFORM support is it a need or a want?

@bruceschaller Would love to hear more about your setup once you have it working, it sounds like a pretty good idea to me.

Hello Bruceshaller, We did run in an issue indeed with inform and would really like to have inform support and snmpv2/3 support for trap/inform messages.

sjwang90 commented 4 years ago

Hi @neeles83, we currently are working on improving our v3 support. If you can comment issues and improvements on #6918 that would be really helpful.

volkan05 commented 4 years ago

Hi everyone,

Today, I tried to configure my telegraf to receive SNMPv3 traps with the following simple configuration: [[inouts.snmp_trap]] service_address = "udp://:162"

The router which I try to monitor is configured with the following:

Router(config-if)#snmp-server enable traps snmp linkdown linkup Router(config-if)#snmp-server host IpOfMyTelegrafManager version 3 priv myV3Username snmp

As you can see, I try to get traps about link up/down my interfaces so when I shut down an interface, I can see with wireshark that the trap is sent with many OID. to my manager. Of course, before shutting down the interface, I started the listening on port 162 in Telegraf like that :

telegraf --input-filter=snmp_trap --output-filter file --test --test-wait=1500

And the screen blocks on the following: 2020-04-01T16:59:23Z I! Starting Telegraf 1.13.4 2020-04-01T16:59:23Z I! Using config file: /etc/telegraf/telegraf.conf 2020-04-01T16:59:23Z I! [inputs.snmp_trap] Listening on udp://:162

But I receive nothing, no traps appear in the terminal. So first, I thought that it was snmptranslate which didn't have the correct MIB so I checked all OIDs I saw in trap packet in wireshark with snmptranslate command to ensure that it was not the problem and that wasn't. After that, I checked with lsof -i -P -n that the udp port n°162 was in listening and it was. So now, I'm looking everywhere in the Internet to find the solution but I find nothing....

Two details : First, I have already enter the command setcap cap_net_bind_service=+ep /usr/bin/telegraf and second, I can perfectly make snmp request from manager to host.

Is there someone who know what is the problem and can help me please ?

I'm sorry if it's not the place for a question like that, I'm new on github ^^

Thanks for your help.

danielnelson commented 4 years ago

@volkan05 Support for v3 traps isn't available yet, so unfortunately it won't work just yet. Keep an eye on #6918 for updates on when it will be supported.

volkan05 commented 4 years ago

@danielnelson All right, thank you for your response.

DataBitz commented 4 years ago

@danielnelson Thank you for adding this new functionality, we have been wanting this also. @ErantD is there additional work needed to have Kapacitor alert on these traps (including details of the trap), or is it a matter of crafting a TICKscript to do this?