kscaldef / summingbird-hybrid-example

A self-contained example of summingbird running in hybrid mode
23 stars 13 forks source link

data never gets ingested #1

Open obh opened 10 years ago

obh commented 10 years ago

Thanks for this awesome example. I think I'm stuck on something really stupid. Basically, I got things working except that on starting the program Zookeeper throws this error "no brokers found when trying to rebalance." After that I can see the events being produced and put into the kafka queue, but nothing gets ingested. (Events Ingested is always 0)

kandu009 commented 9 years ago

I am facing the same issue. Did this issue get fixed at all ? If there is any alternative to get out of this, can someone please share their experience ?

ghost commented 9 years ago

Are you using Kafka 7 or 8? I've seen this issue before when using the wrong Kafka consumer version. I.e 7 instead of 8 or vice versa.

Try using this PR from Tormenta https://github.com/twitter/tormenta/pull/52 if you are using Kafka 8.

If you are using Kafka 7 then you will probably have to adapt this example to use the original Kafka-Tormenta API instead of https://github.com/kscaldef/summingbird-hybrid-example/blob/master/src/main/scala/com/twitter/tormenta/spout/KafkaSpout.scala

kandu009 commented 9 years ago

Thanks for the reply.

I have tried using kafka 8 and followed the instructions. But it doesn't help. Any clear instructions on what all changes needs to be done here ?

I have been looking at your second option of using Kafka 7. If possible, could you share your modified hybrid example for Kafka 7?

kandu009 commented 9 years ago

@upio I have added more details regarding issues with Kafka 8 + SummingBird here https://github.com/kscaldef/summingbird-hybrid-example/issues/2

ghost commented 9 years ago

I've put together a modified example using Docker and my patched Tormenta for you here https://github.com/upio/summingbird-hybrid-example

There are instructions in there but you'll need docker, fig and https://github.com/upio/tormenta in your local maven repository.

See if this works for you.

kandu009 commented 9 years ago

Hi,

Thanks for sharing the details. I am able to run this but I see a couple of exceptions, errors, warnings here.

1. 14/11/18 23:20:01 WARN producer.BrokerPartitionInfo: Error while fetching metadata [{TopicMetadata for topic summingbird.proto.productview ->
No partition metadata for topic summingbird.proto.productview due to kafka.common.LeaderNotAvailableException}] for topic [summingbird.proto.productview]: class kafka.common.LeaderNotAvailableException

2. 14/11/18 23:20:01 ERROR async.DefaultEventHandler: Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: summingbird.proto.productview

3. 14/11/18 23:05:00 WARN scalding.Scalding: Store: List() has no commutativity setting. Assuming MonoidIsCommutative(NonCommutative)
14/11/18 23:05:00 INFO scalding.Scalding: Store: List() is non-commutative (less efficient than commutative)

4. 14/11/18 23:05:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/18 23:05:05 WARN snappy.LoadSnappy: Snappy native library not loaded

5. Though it sees data in these partitions and counts them periodically, I see these warnings in every loop.

14/11/18 23:02:59 WARN kafka.KafkaUtils: No data found in Kafka Partition partition_1
14/11/18 23:03:03 WARN kafka.KafkaUtils: No data found in Kafka Partition partition_0
ghost commented 9 years ago

Did you change your IP address in the fig.yml file? I think this is what causes the LeaderNotAvailable issues. Make sure the advertised hosts in fig.yml is your IP.

Also, make sure clean up what you've done with docker already:

  1. Change the IP
  2. fig stop
  3. fig rm
  4. fig up -d
  5. rm -rf /tmp/summingbird-proto/

See if this fixes it.

kandu009 commented 9 years ago

Yeah I did change the IP as mentioned in the README. After doing what you have suggested I still see all of the above mentioned errors, warnings.

ghost commented 9 years ago

I see those warnings too. Not sure if they are important, but it works so I don't think so. It's something to do with your Kafka set up. Can you show me output of:

ifconfig fig ps cat fig.yml

Have you tried using Kafka CLI tools and Zookeeper CLI tools to see if you can connect to Kafka and Zookeeper? I still think it's an issue with Kafka not being able to communicate with the Zookeeper Docker container. I've had this exact issue before and the problem is always the IP address in fig.yml. What operating system are you using? I haven't tested this with boot2docker on mac/windows.

kandu009 commented 9 years ago

fig ps:

            Name                              Command               State                      Ports

summingbirdhybridexample_kafka_1 /bin/sh -c start-kafka.sh Up 0.0.0.0:49155->9092/tcp summingbirdhybridexample_memcached_1 memcached Up 0.0.0.0:49153->11211/tcp summingbirdhybridexample_zookeeper_1 /opt/zookeeper-3.4.5/bin/z ... Up 0.0.0.0:49154->2181/tcp, 2888/tcp, 3888/tcp

I have used the same IP address that's given under ifconfig eth0 inet in fig.yml I am using Windows.

I haven't tried using the CLI tools yet. Will try that out and see.

kandu009 commented 9 years ago

@upio Is there a way in which we can specify multiple hosts to run this entire setup? I mean, run storm on host1 and scalding on host2 and run the hybrid on one of these hosts host1 or host2 ? Thanks in advance.

ghost commented 9 years ago

Well there is no way to specify multiple hosts but you can just manually run the StormRunner and ScaldingRunner from different machines and then change the Memcached addresses for the Hybrid Store. Eventually all these jobs will do is launch jobs on a Storm/Hadoop cluster and load data into 2 separate serving layers like Memcached/Cassandra/HBase. An example of this set up would be awesome.

dkwestbr commented 9 years ago

https://github.com/upio/summingbird-hybrid-example works for me

jak3chase commented 8 years ago

Using upio's's forked example, I get a lot of errors that look like: WARN state.ConnectionStateManager: There are no ConnectionStateListeners registered.

ERROR producer.SyncProducer: Producer connection to localhost:49155 unsuccessful java.net.ConnectException: Connection refused

I think I am using the correct IP, the one from docker0 in ifconfig. I've also tried a bunch of IPs (eth0 etc).

Any ideas?

ghost commented 8 years ago

@jak3chase can you open an issue on the forked version and include fig ps and information about your environment? Linux, OSX or Windows for example? First things that comes to mind is boot2docker, port forwarding an binding to localhost instead of 0.0.0.0.

jak3chase commented 8 years ago

@upio Thanks a lot for the reply! Unfortunately I wasn't able to open an issue on the forked repository after looking for a bit. Perhaps you haven't enabled Issues?

Anyways, I'm running OS X 10.10.13, and Java 7. My fig ps looks exactly like the one on the README, and I added the modified Tormenta to my maven repo.

fig ps:

summingbirdhybridexample_kafka_1 /bin/sh -c start-kafka.sh Up 0.0.0.0:49155->9092/tcp
summingbirdhybridexample_memcached_1 memcached Up 0.0.0.0:49153->11211/tcp
summingbirdhybridexample_zookeeper_1 /opt/zookeeper-3.4.5/bin/z ... Up 0.0.0.0:49154->2181/tcp, 2888/tcp, 3888/tcp