twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

How to install summingbird in cluster? #719

Open leesf opened 7 years ago

leesf commented 7 years ago

Hi, everyone. I want to install summingbird in cluster environment, what should i do? since i have known how to install it stand-alone, and what's more, in cluster environment, the online layer cache designed on my own should process the issues brought in cluster environment?Thanks for replying.

leesf commented 7 years ago

@oscar-stripe @ianoc @sritchie @singhala

sritchie-stripe commented 7 years ago

Hey @leesf, can you give us some more information about what sort of cluster you're running?

Summingbird might be a little different than what you're imagining. Summingbird is a library that generates jobs for Storm or Hadoop clusters. Summingbird isn't a platform that you can "run" - you need to have an existing Storm or Hadoop cluster running. Once you have those set up you can compile your Summingbird job for either of those platforms and deploy it like any other job.

At this point you're responsible for running

Does that make sense?

leesf commented 7 years ago

@sritchie-stripe thanks for replying. Actually i know that storm and hadoop cluster should be set up before running summingbird job. And i use HBase in batch layer, it works well. However, in online layer,i use my own designed online store instead of Redis or Memcache, it works well in single host. The question is that does it works well in cluster? And should i take the communication of machines in cluster into consideration since the summingbird will solve the problem? Thanks a lot.

johnynek commented 7 years ago

you can use any online store as long as you make a small storehaus wrapper for it:

https://github.com/twitter/storehaus