hdinsight / hdinsight-storm-examples

This is a repository for complete and easy to use samples that demonstrate the use of Apache Storm on HDInsight
Apache License 2.0
59 stars 52 forks source link

Sample showing partition-aware EventHubs spout configuration #9

Closed nblumhardt closed 9 years ago

nblumhardt commented 9 years ago

Hello; I'm currently working on an IoT project that's exploring EventHubs + HDInsight/Storm for processing large numbers of device telemetry events.

I've found the various "develop Storm topologies in C#" examples that refer to EventHubs and have experimented with the C# SDK, but can't find an example of a (custom) EventHub spout in C# that's able to work with multiple EventHub partitions.

It seems like using the id of the spout instance to determine which partition to read from is the way to go, but this information (context.getThisComponentId() in Java) doesn't seem to be present in the C# SDK.

Are there further samples available/planned using C# in this scenario? Adding Java to our current technology stack isn't feasible right now. (Also, is there a better forum for these kinds of discussions?)

Many thanks in advance!

ravitandonrt commented 9 years ago

Hi nblumhardt

There is a template example of writing into EventHub that shows how to get the topology context and task id.

In this example I have shown how to write into each partition based on task index. You can write your own reading or distribution strategy based on partition count vs number of tasks.

templates/HDInsightStormExamples/Bolts/EventHub/EventHubBolt.cs

TopologyContext topologyContext = Context.TopologyContext;
Context.Logger.Info(this.GetType().Name + " TopologyContext info:");
Context.Logger.Info("TaskId: {0}", topologyContext.GetThisTaskId());
var taskIndex = topologyContext.GetThisTaskIndex();
Context.Logger.Info("TaskIndex: {0}", taskIndex);
string componentId = topologyContext.GetThisComponentId();
Context.Logger.Info("ComponentId: {0}", componentId);
List<int> componentTasks = topologyContext.GetComponentTasks(componentId);
Context.Logger.Info("ComponentTasks: {0}", componentTasks.Count);

Do you want to write your own C# spout for EventHub? Any particular reason you do not want to use the provided Java version?

nblumhardt commented 9 years ago

Thank you for the pointer @rtandonmsft - I am not sure how we missed TopologyContext.GetThisTaskId() and so-on, we may have been working with an earlier version? This seems like the API we need.

We have been working on a pilot project evaluating HDInsight/Storm as a general platform, so writing our own spout provided an opportunity to assess whether C#-only development was a viable strategy in general (vs. cross-skilling developers in Java). For a production implementation, using the pre-built spout unmodified seems like a good option for us to explore should we stay on this path.

Thanks again for your help (cc also @mwinkle, who's also helped us find our way here).