jcustenborder / kafka-connect-splunk

Kafka Connect connector for receiving data and writing data to Splunk.
Apache License 2.0
25 stars 10 forks source link

Unable to use Splunk Source - Properties and Flow #11

Closed sunny1978 closed 7 years ago

sunny1978 commented 7 years ago

Hi I need some help on using this connector for just Splunk Source.

Purpose: Some XYZ is Feeding in to Splunk --> Splunk rises Events --> This Connector --> Kafka System --> Third Party Kafka Consumer --> 3rd Party Analytics.

1: Does third party guys have to go thru HEC to feed data in to Splunk? Can they go thru SDK/Rest etc their existing ways to feed in to? 2: How is HEC communicating to Connector. Props: splunk.collector.url splunk.port 3: I am not able to understand who initiates the data call. Pull: Connector Pulls new data from HEC If Pull: Connector.url: Full url? then Port is part of it? Why again we need splunk.port?

Push: HEC pushes event as it happens to Connector's EventServlet.post method? If Push: Kindly give steps to configure HEC with Connector's IP and Port + Connector Start Config and Start Steps

4: Some magic EventServlet.post is called with Event Message in JSON. This is converted to Kafka's Struct type. 5: Now who and how is this Struct reaching Kafka System? I donot see any parameter to specify Kafka Server IP and Port. Only topic is being asked.

jcustenborder commented 7 years ago

Hello there,

Purpose: Some XYZ is Feeding in to Splunk --> Splunk rises Events --> This Connector --> Kafka System --> Third Party Kafka Consumer --> 3rd Party Analytics.

The SplunkHttpSourceConnector embedds a web server and emulates the functionality of the Splunk HEC. With the HEC on Splunk you post JSON messages to it and splunk will write it to the index. The Source connector will receive this message instead and write it to Kafka.

The SplunkHttpSinkConnector actually writes the data from Kafka to a Splunk HEC endpoint. It will read the data from Kafka and write the data to Splunk using a Splunk HEC endpoint on the Splunk Server.

1: Does third party guys have to go thru HEC to feed data into Splunk?

No the SplunkHttpSinkConnector will read anything from the topic. I know a few teams that use kafka-connect-syslog to receive syslog messages and then forward them to splunk.

Can they go thru SDK/Rest etc their existing ways to feed in to?

The SplunkHttpSourceConnector should be able to receive data from the Splunk Rest libraries. It will receive the data and write it to Kafka. Down the line you could do whatever you wanted. Forward it on to splunk, elastic search, etc.

How is HEC communicating to Connector.

The HEC does not initiate any communication with the SplunkHttpSinkConnector. The SplunkHttpSinkConnector works by taking data from the specified Kafka topics, converting them to a Splunk HEC compatible payload, and sending that data to a Splunk HEC endpoint.

Some magic EventServlet.post is called with Event Message in JSON. This is converted to Kafka's Struct type.

The SplunkHttpSourceConnector embeds a jetty web server which converts the posted json messages to Kafka Connect Structs in real time. A struct is just an abstraction. The data stored in the topic could be json or avro for example.

Now who and how is this Struct reaching Kafka System? I donot see any parameter to specify Kafka Server IP and Port. Only topic is being asked.

This is a connector for Kafka Connect. Kafka Connect is a distributed framework for running connectors and tasks against a Kafka cluster. The easy way to look at it is Kafka Connect is the E & L of ETL. You'll configure the Kafka Connect worker with parameters to connect to Kafka. The connector will handle connecting to the target systems like splunk. Here is an example config I use for Standalone mode in development. This is where your configuration is handled. Here is some documentation for Kafka Connect.

sunny1978 commented 7 years ago

Hi Jeremy Thank You very much for your reply. Here is my server architecture. Kindly help me out in clarifying below. It helps recommend server architecture to my team and finish my task.

What is working so far (tested end to end): -- Kafka Connect (with ZK and Schema server) -- Sink: MySql. Able to successfully write kafka contents to MySql -- Sink: Cassandra. Able to successfully write kafka contents to Cassandra

Todo/InProgress: -- Source: Splunk

You got lot of connectors. We are very excited on how easy it is to use Kafka-Connect as our Integ Layer and bring data in to our Servers. We are exploring Sink: Cassandra, Solr, Splunk, MongoDB

Sources: rabbitmq, splunk, kinesis, eventhub etc.

Splunk as Source is my immediate priority.

Arch: Splunk Enterprise: IP1 Kafka Connect: IP2 (with Kafka-Connect-Splunk, JMS, MySql, Cassandra-Sink etc) Cassandra/Other App: IP3

1: I will configure HEC on Splunk in IP1 2: Start Kafka Connect, ZK and Schema on IP2 3: Since Kafka-Connect and Splunk Source running in same IP2, it goes by localhost and just need Topic param? 4: I only want to read data from Splunk and write to Kafka. So should I worry about SplinkHttpSinkConnector? 5: IP2's Splunk Source Jetty Server post a JSON request to Splunk (running on IP1) using "splunk.collector.url"? Since Splunk and ConnectorSrc are in different hosts - should I give full URL like "splunk.collector.url=https://:8088/services/collector/event ". 5.1: How about Token? 6: Splunk runs the Report/Search requested by Jetty and writes response to its Index (IP1). 6.1: So its PULL based architecture. Jetty periodically requests data from Splunk. It takes that and writes to Kafka? Is this period configurable? I see while loops in Code. So almost real-time, doesnt it spam the Splunk with tons of requests? 7: How does IP2's Splunk Source retrieves this Message? As a response to Post request or using "splunk.port"? If by port, shouldnt it worry about hostname i.e IP1? 8: I got the rest of the part. This JSON Message is converted to Kafka's Struct and then written to localhost:6667/2181 and Topic configured in " kafka.topic"

Thanks Sunil.

On Wed, May 10, 2017 at 11:19 AM, Jeremy Custenborder < notifications@github.com> wrote:

Hello there,

Purpose: Some XYZ is Feeding in to Splunk --> Splunk rises Events --> This Connector --> Kafka System --> Third Party Kafka Consumer --> 3rd Party Analytics.

The SplunkHttpSourceConnector https://github.com/jcustenborder/kafka-connect-splunk#splunkhttpsourceconnector embedds a web server and emulates the functionality of the Splunk HEC. With the HEC on Splunk you post JSON messages to it and splunk will write it to the index. The Source connector will receive this message instead and write it to Kafka.

The SplunkHttpSinkConnector https://github.com/jcustenborder/kafka-connect-splunk#splunkhttpsinkconnector actually writes the data from Kafka to a Splunk HEC endpoint. It will read the data from Kafka and write the data to Splunk using a Splunk HEC endpoint on the Splunk Server.

1: Does third party guys have to go thru HEC to feed data into Splunk?

No the SplunkHttpSinkConnector https://github.com/jcustenborder/kafka-connect-splunk#splunkhttpsinkconnector will read anything from the topic. I know a few teams that use kafka-connect-syslog https://github.com/jcustenborder/kafka-connect-syslog to receive syslog messages and then forward them to splunk.

Can they go thru SDK/Rest etc their existing ways to feed in to?

The SplunkHttpSourceConnector https://github.com/jcustenborder/kafka-connect-splunk#splunkhttpsourceconnector should be able to receive data from the Splunk Rest libraries. It will receive the data and write it to Kafka. Down the line you could do whatever you wanted. Forward it on to splunk, elastic search, etc.

How is HEC communicating to Connector.

The HEC does not initiate any communication with the SplunkHttpSinkConnector https://github.com/jcustenborder/kafka-connect-splunk#splunkhttpsinkconnector. The SplunkHttpSinkConnector https://github.com/jcustenborder/kafka-connect-splunk#splunkhttpsinkconnector works by taking data from the specified Kafka topics, converting them to a Splunk HEC compatible payload, and sending that data to a Splunk HEC endpoint.

Some magic EventServlet.post is called with Event Message in JSON. This is converted to Kafka's Struct type.

The SplunkHttpSourceConnector https://github.com/jcustenborder/kafka-connect-splunk#splunkhttpsourceconnector embeds a jetty web server which converts the posted json messages to Kafka Connect Structs in real time. A struct is just an abstraction. The data stored in the topic could be json or avro for example.

Now who and how is this Struct reaching Kafka System? I donot see any parameter to specify Kafka Server IP and Port. Only topic is being asked.

This is a connector for Kafka Connect http://docs.confluent.io/current/connect/index.html. Kafka Connect http://docs.confluent.io/current/connect/index.html is a distributed framework for running connectors and tasks against a Kafka cluster. The easy way to look at it is Kafka Connect is the E & L of ETL. You'll configure the Kafka Connect worker with parameters to connect to Kafka. The connector will handle connecting to the target systems like splunk. Here is an example config https://github.com/jcustenborder/kafka-connect-splunk/blob/master/config/connect-avro-docker.properties I use for Standalone mode in development. This is where your configuration is handled. Here is some documentation for Kafka Connect http://docs.confluent.io/current/connect/userguide.html#configuring-workers .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jcustenborder/kafka-connect-splunk/issues/11#issuecomment-300535426, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm-DKMK9z8y_Ol7GGxVTF5u0l5B0Cgsks5r4eORgaJpZM4NUzA8 .

sunny1978 commented 7 years ago

I got it working. But still I need help on above questions. This will help me understand better and also support Multi Environment. HEC in IP1 and Kafka in IP2. I cannot have Splunk and Kafka on server.

Steps

1: Enable HEC on Splunk

2: Create a HEC Endpoint

3: Start kafka, Zoo and Schema servers Script: /home/c/confluent-3.2.1/startall.sh

4: 4.1: Build like Uber Jar about 7-8 MB size. Copy it to Server. 4.2 Start Kafa-Connect-Splunk Script:/home/c/kafka-connect-splunk/startsplunksource.sh Props: name=splunk-kafka tasks.max=1 connector.class=com.github.jcustenborder.kafka.connect.splunk.SplunkHttpSourceConnector splunk.collector.url=/services/collector/event splunk.ssl.key.store.password=cfx2017 splunk.collector.index.default=default splunk.ssl.key.store.path=/home/cfx/keystore.jks kafka.topic=fromsplunk 4.3: Start Command .../bin/connect-standalone etc/schema-registry/connect-avro-standalone.properties /home/c/kafka-connect-splunk/source.properties

5: Send data: This connector - Emulates HEC - so give 8088 as port on which it starts and acts as HEC. This some how connects to actual HEC and submits data.

5.1: Post Data to HEC

[cfx@sunil.host ~]$ curl -k https://IP2:8088/services/collector/event -H 'Authorization: Splunk 1596BC2D-9361-40C6-8D13-XxXXXXXXXX' -d '{"event":"HelloWorld-0510-11PM","host":"sunil.host","source":"Sunil","sourcetype":"SunilSrcType"}'

5.2: verify data in splunk

5.3: verify that data is pushed to kafka:

Check on Kafka:

[cfx@sunil.host kafka-connect-splunk]$ sh ../confluent-3.2.1/bin/kafka-avro-console-consumer --topic fromsplunk --zookeeper localhost:2181 --from-beginning

{"time":{"long":1494476035907},"host":{"string":"sunil.host"},"source":{"string":"Sunil"},"sourcetype":{"string":"SunilSrcType"},"index":{"string":"default"},"event":{"string":"HelloWorld-0510-11PM"}}

jcustenborder commented 7 years ago

The source connector is a way for you to rip out splunk. It's doesn't do anything with splunk. It just acts like the Splunk HEC endpoint. This allows you to write to Kafka instead of Splunk.

sunny1978 commented 7 years ago

From: jcustenborder: Why don't you set splunk.remote.host in your config? You can point it to the splunk server on another host. Also don't build an uber jar, there are already releases with the proper output. Try the tar.gz, rpm, or deb package.