salesforce / storm-dynamic-spout

A framework for building spouts for Apache Storm and a Kafka based spout for dynamically skipping messages to be processed later.
BSD 3-Clause "New" or "Revised" License
40 stars 13 forks source link

Drop VirtualSpout.getMaxLag() and pass the MetricsRecorder to Consumer #45

Closed stanlemon closed 6 years ago

stanlemon commented 6 years ago

The getMaxLag() method has always felt awkward and it's another one of those situations where we are leaking our concerns across layers. I've wanted to switch this maxLag to be the endOffset and in other Consumer implementations we have a need for the ability to report metrics. So this PR introduces the MetricsRecorder into the Consumer interface and drops the getMaxLag() metric in favor of a consumer specific one for the starting offset.

Crim commented 6 years ago

While "maxLag" was kind of a garbage stat before, it looks like we lost it entirely in this PR?

Should we be calculating the difference between the current state and end offset for each partition and reporting that?

stanlemon commented 6 years ago

@Crim Yeah, I see the benefit to that, let me add it.

Crim commented 6 years ago

Where is that reported in the consumer? Or is that in SpoutPartitionProgressMonitor.java? Seems weird to have that logic split across two un-related classes.

Does it make sense to just entirely remove SpoutPartitionProgressMonitor.java and fold that logic elsewhere?

stanlemon commented 6 years ago

@Crim I feel like what we want/need is a set of metrics at the DynamicSpout and VirtualSpout layers that are independent of the Consumer metrics. Right now they are kind of convoluted, but SpoutPartitionProgressMonitor has the sort of thing I suspect we want, just not sure if that's the final resting place. Because that class isn't public and we're not forcing an implementing method I'm OK with it right now, but we should think about a better structure long term.

Crim commented 6 years ago

RE: SpoutPartitionProgressMonitor. The original purpose was to abstract out the common metrics across various consumers so it wouldn't have to be re-implemented for every one, but perhaps thats an early/not needed optimization?

stanlemon commented 6 years ago

@Crim Yeah, I think what I learned from our other consumer implementation is that the metrics we thought were generic on the kafka one weren't necessarily what I wanted to add. I think standardizing metrics in VirtualSpout about tuples passing through is ok, and I guess I'm saying I feel like that's more what SpoutPartitionProgressMonitor is doing - so it's ok.

stanlemon commented 6 years ago

@Crim Are we good to go on this one now?

Crim commented 6 years ago

Looks good. I made a slight tweak in this PR: https://github.com/salesforce/storm-dynamic-spout/pull/50 which wants to merge into your PR here.

Feel free to merge or discard.

stanlemon commented 6 years ago

@Crim I think we should switch to String.format() in #50, but then I'm fine with you merging it into this branch.

Crim commented 6 years ago

For readability I'd agree with you, tho the interwebs seems to believe that String.format() was considerably slower than just concat'ing

Crim commented 6 years ago

TODO: Update README with new / removed metrics.

Crim commented 6 years ago

I updated the README. I think the Changelog could use some work, may tackle that next.

Crim commented 6 years ago

Ok, seems solid to me @stanlemon 👍