snowplow-archive / kinesis-example-scala-producer

Example Scala/SBT event producer for Amazon Kinesis
http://snowplowanalytics.com
21 stars 11 forks source link

Add ability to send events as Thrift #4

Closed alexanderdean closed 10 years ago

alexanderdean commented 10 years ago

Assigning to Brandon (for now) as a way of getting up to speed with Thrift in an easy way.

bamos commented 10 years ago

@alexanderdean, should I do the following for this?

  1. Integrate Thrift's Java code generator into sbt with sbt-thrift. I also found twitter/sbt-scrooge and bancek/sbt-scrooge, but they target sbt 11 and 12 and don't seem to have much activity.
  2. Specify a Thrift IDL with a simple struct (a string and timestamp).
  3. Add an option to send events with the IDL as a ByteBuffer.

Do I need to do anything with Thrift services, or are we mostly interested in the cross-language IDL?

alexanderdean commented 10 years ago

Hey @bamos - exactly as you say: 1 (with sbt-thrift), 2 & 3. You're right: we are interested in Thrift's cross-language IDL, rather than its RPC service capabilities.

bamos commented 10 years ago

https://github.com/snowplow/kinesis-example-scala-producer/commit/120f038e518022780a3abf9c1dab09e53e870876, https://github.com/snowplow/kinesis-example-scala-producer/commit/b351dd1bf5363447a5db58cfda3cb2a6e235ac4e, and https://github.com/snowplow/kinesis-example-scala-producer/commit/b0b39e1a0ae6eb0dc5b3c4dacf462bfc14391d07 in the producer add an option to serialize a simple Thrift object, and https://github.com/bamos/kinesis-example-scala-consumer/commit/f1bae1e1a5fa59d0ea6c75e96c3a910086a079f5, https://github.com/bamos/kinesis-example-scala-consumer/commit/a2f8dd6ee3cb14c5c4c939afb456d647d2caef9c, and https://github.com/bamos/kinesis-example-scala-consumer/commit/95629a4c989b910094fdfab2d2888d320e39ba9f in the consumer add an option to deserialize the same object.

After adding 10 items with the producer, the consumer can read 9:

sequenceNumber: 49535040096504505861635224581804745460990125322667032577
data: StreamData(name:example-record, timestamp:48184)
partitionKey: partition-key-48184
sequenceNumber: 49535040096504505861635224587289807100551190750005559297
data: StreamData(name:example-record, timestamp:47988)
partitionKey: partition-key-47988
sequenceNumber: 49535040096504505861635224603103449640600343973715771393
data: StreamData(name:example-record, timestamp:48547)
partitionKey: partition-key-48547
sequenceNumber: 49535040096504505861635224626650710481035047255045832705
data: StreamData(name:example-record, timestamp:47813)
partitionKey: partition-key-47813
sequenceNumber: 49535040096504505861635224664537403167598534315258937345
data: StreamData(name:example-record, timestamp:48377)
partitionKey: partition-key-48377
sequenceNumber: 49535040096504505861635224799428958219520117796918788097
data: StreamData(name:example-record, timestamp:47113)
partitionKey: partition-key-47113
sequenceNumber: 49535040096504505861635225785778852973770066835284164609
data: StreamData(name:example-record, timestamp:48888)
partitionKey: partition-key-48888
sequenceNumber: 49535040096504505861635225805355956113375442245628461057
data: StreamData(name:example-record, timestamp:49143)
partitionKey: partition-key-49143
sequenceNumber: 49535040096504505861635225837528451849114377091435986945
data: StreamData(name:example-record, timestamp:49017)
partitionKey: partition-key-49017

I suspect this is related to the other consumer issue.

bamos commented 10 years ago

https://github.com/snowplow/kinesis-example-scala-consumer/commit/546c63c74ee7d421aad07ef670b0c555082d864b fixes issues with missing data in the consumer.

The producer sends events as Thrift and the consumer reads all of them now.

Let me know if there's anything else on this.

alexanderdean commented 10 years ago

Awesome, nice work @bamos !