pulsarIO / realtime-analytics

Realtime analytics, this includes the core components of Pulsar pipeline.
Other
652 stars 125 forks source link

Moving traffic generator out of collector #7

Open mariusrugan opened 9 years ago

mariusrugan commented 9 years ago

Hi,

i've started cleaning up the collector & moved traffic generator to a (sub) demo/ project. I also improved a bit traffic generator to generate single events rather than by default batch.

would be helpful if someone takes a look and gives some feedback (and builds also :) ) for: https://github.com/MariusRugan/realtime-analytics/tree/develop

work is not 100% done yet & i'd like to understand if this would be helpful to the master project?

thanks

shmurthy62 commented 9 years ago

Hi Marius,

We will review and provide feedback to you.

Sharad

shmurthy62 commented 9 years ago

Hi Marius,

Looks OK! A few suggestions - why not make RandomTrafficGenerator an InboundChannel instead of a Processor. A processor expects to receive events, processes and and sends event to its sinks. The traffic generator is a mock channel impl.

Can you please complete the pipeline flow control impl by implementing resume() method by setting m_runFlag to true. This will make the impl complete.

Can you please make sure that you have built the the docker images and tested the pipeline locally. It will take some more time for others to also review and provide comments. If we all agree to bring it in you will have to submit a pull request. Have you signed the contributor agreement?

mariusrugan commented 9 years ago

Hi Sharad, thanks for replying, here are my comments:

1 - i'm really keen on delivering http requests - just as an external system would do via a pixel call - to the collector app. Although collector app right now is imho a bit too 'hardcore' to touch, as it's based on a servlet implementation (that's a subject for another discussion) i would suggest going ahead and improving even more the traffic generation (as you've seen, i've started adding dynamic referer and such) so real-life-like traffic can be aggregated/counted via EPL from metricUI. One of the aims would be 'events in a user's session'. (think at the shopping funnel etc.).

Having motivated my http affinity :) I would happily participate in a parallel implementation in the demo module for traffic as 'InboundChannel' as you suggest. I must admit i didn't dig into 'replay' module a lot, but it seems what you referring to is PoC there. Please correct me if i'm wrong.

2. resume(): Yes, will wrap it up.

3A. Docker: Currently running via IDE everything but i will adjust all buildsrc & docker scripts for this.

3B. Don't know of any "contributor agreement", please share a link where i can read it.

Thanks M.

shmurthy62 commented 9 years ago

I completely agree with where you are going with http. We use it in our own pipeline in production and it can ingest large volumes with our implementation. We require the connection to be persistent and we recommend batching with compression for high throughput. Another option is to create a websocket channel which we have discussed internally.

As I reflect more about your impl pipeline flow control might not make sense as you are writing directly to a HTTP connection. However If your RandomTrafficGenerator focused mostly with the generation of traffic and not couple yourself with HttpClient then one option you might consider is wiring your traffic generator to any outboundChannel. In that case flow control makes sense.

BTW we already have outboundRESTChannel which can be provisioned to send the event to a set of HTTP endpoints. We have not documented this in our Jetstream wiki so its not visible to you. Check out "com.ebay.jetstream.event.channel.messaging.http.outbound.OutboundRESTChannel".

Its wiring looks as follows:

http://127.0.0.1:15999/RTBDEventConsumerServlet

The contributor agreement can be found here - https://github.com/pulsarIO/realtime-analytics/wiki/How-to-contribute

shmurthy62 commented 9 years ago

Sorry, forgot to specify wiring as XML - here it is.


<bean id="outboundRESTChanneBinder" class="com.ebay.jetstream.event.support.channel.ChannelBinding">
    <property name="channel" ref="outboundRESTChannel" />
</bean>

<bean id="outboundRESTChannel" class="com.ebay.jetstream.event.channel.messaging.http.outbound.OutboundRESTChannel">
    <property name="address" ref="outboundRESTChannelAddress" />
    <property name="config" ref="HttpClientConfig" />
</bean>

<bean id="outboundRESTChannelAddress" class="com.ebay.jetstream.event.channel.messaging.http.outbound.RESTChannelAddress">
    <property name="urlList">
        <list>
          <value>http://127.0.0.1:15999/RTBDEventConsumerServlet</value>                 </list>
    </property>
</bean>

<bean id = "HttpClientConfig" class="com.ebay.jetstream.http.netty.client.HttpClientConfig">
</bean>
xinglang commented 9 years ago

The idea is good. Move out traffic generator will make collector more clean. Previously the simulator use commons-httpclient to send the http traffic, it is ok for the simulator. If change to use the Outbound Rest Channel, it have to deserialize the String and convert it to JetstreamEvent. I think it is ok to keep using commons-httpclient.

One reason why we put the simulator inside the collector is it can use localhost as the server ip and easy to control for demo purpose. Move it out should make the URL can be configurable and also can be override via system property.

And once it move out, we need create a new docker for this. Will trafficsimulator a better module name?

I saw you add a new dependency:

org.projectlombok lombok ${lombok.version} compile

What did it use for?

mariusrugan commented 9 years ago

Hi Xinglang,

thanks for replying, your suggestions are valid, will add (runtime) config capability via properties for receiving endpoint.

as i commented above docker config will be cleaned and tested also.

related to the package name i suggest then: *.processor.httpTrafficSimulator

lombok: i took out boilerplate code , eg getters, setters in the class. i also used @scheduled from spring btw. (with dependency via jetstream dependencies)

On Tuesday, March 17, 2015, Xinglang Wang notifications@github.com wrote:

The idea is good. Move out traffic generator will make collector more clean. Previously the simulator use commons-httpclient to send the http traffic, it is ok for the simulator. If change to use the Outbound Rest Channel, it have to deserialize the String and convert it to JetstreamEvent. I think it is ok to keep using commons-httpclient.

One reason why we put the simulator inside the collector is it can use localhost as the server ip and easy to control for demo purpose. Move it out should make the URL can be configurable and also can be override via system property.

And once it move out, we need create a new docker for this. Will trafficsimulator a better module name?

I saw you add a new dependency:

org.projectlombok lombok ${lombok.version} compile

What did it use for?

— Reply to this email directly or view it on GitHub https://github.com/pulsarIO/realtime-analytics/issues/7#issuecomment-82091360 .