jjbuchan / docs

0 stars 0 forks source link

How to manually read data from kafka #2

Open jjbuchan opened 3 years ago

jjbuchan commented 3 years ago

Connect to one of the kafka nodes and then run the following:

$ cd /opt/kafka/bin
$ ./kafka-run-class.sh kafka.tools.SimpleConsumerShell l -broker-list localhost:9092 --topic observations.json --partition 0 --offset -1

Specifying --offset -1 means it will begin streaming from the most recent offset. You will also want to alter the --partition to whatever the account you're interested in uses, and then likely want to grep for a particular detail such as check id.

Query Kafka in VM when running tests

The port is different and you likely want to read from the beginning of time, so will specifiy an offset of 0.

./kafka-run-class.sh kafka.tools.SimpleConsumerShell l -broker-list localhost:19092 --topic mytopic --partition 51 --offset 0

jjbuchan commented 3 years ago

To review past messages from a specific point in time, you can use a combination of GetOffsetShell and Epoch Converter.

First, determine the millisecond timestamp of the desired point in the recent past. With that and account's partition (see previous comments), you can retrieve an offset from that point, such as

cd /opt/kafka/bin
./kafka-run-class.sh kafka.tools.GetOffsetShell \
  --broker-list localhost:9092 --topic observations.json \
  --partition 12 --time 1482098400000

That will output the results as topic:partition:offset where offset will be blank if not present within the buffered topic content.

Now with a specific offset, you can retrieve and grep the subset of prior content, such as this usage to view a specific activity:

./kafka-run-class.sh kafka.tools.SimpleConsumerShell \
  --broker-list localhost:9092 --topic mytopic \
  --partition 12 --offset 8356901434 \
  | grep "thing"
jjbuchan commented 3 years ago

jq can be used to trim down the fields of the observations to help with visual inspection:

/kafka-run-class.sh kafka.tools.SimpleConsumerShell l -broker-list localhost:9092 --topic observations.json --partition 27 --offset -1 \
  | grep ac1234 \
  | grep remote.http \
  | jq '{timestamp:.timestamp, status:.status, target:.target}' -c