hmsonline / storm-cassandra-cql

Storm Cassandra Bridge built on CQL
Apache License 2.0
43 stars 38 forks source link

Questions on cassandra querying scheme #59

Open gireeshramji opened 8 years ago

gireeshramji commented 8 years ago

1.) Am I correct in saying that if I define a CassandraCqlMapState with a parallelism of N, then N different instances of the Session object will be created? If this is correct, does this not go against what is recommended by DataStax: http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra where they say that a single Session object should be used across the application? Or does that not make sense in the storm context where the state could be distributed across different physical nodes?

2.) For all queries, CassandraCqlStateMap prepares a BatchStatement to submit to Cassandra. Is this guaranteed to have better performance than executing the individual statements asynchronously? Ref: https://lostechies.com/ryansvihla/2014/08/28/cassandra-batch-loading-without-the-batch-keyword/. Basically what I am trying to get to is: Does the micro-batching in Trident really offer any amortisation of costs given how cassandra behaves with batch queries?

Thanks Gireesh

boneill42 commented 8 years ago

Gireesh,

(1) I believe sessions are re-used, because the CassandraCqlMapState is create from the Factory, which uses a single instance of CqlClientFactory to generate sessions, and will re-use a session if it already has one. Have a look at this line: https://github.com/hmsonline/storm-cassandra-cql/blob/master/src/main/java/com/hmsonline/trident/cql/CqlClientFactory.java#L45 https://github.com/hmsonline/storm-cassandra-cql/blob/master/src/main/java/com/hmsonline/trident/cql/CqlClientFactory.java#L45

(2) No, batch statements don’t necessarily have better performance than single statements. In fact, i think they may even be a tad bit slower because they are coordinated, but I would have to do some digging to refresh on that.

hope this helps,

brian

Brian O'Neill Principal Architect @ Monetate m: 215.588.6024 boneill@monetate.com mailto:boneill@monetate.com Watch our Video http://monetate.wistia.com/medias/q6hot9ckl0 | Marketing Blog http://monetate.com/blog/ | Free Resources http://monetate.com/resources/ | Follow Monetate on Twitter http://twitter.com/monetate | Download our Resources App http://monetate.com/app

Know a great software engineer and want a referral bonus even if YOU don't work at Monetate? Tell our recruiters about them!

On Jan 29, 2016, at 2:43 AM, gireeshramji notifications@github.com wrote:

1) Am I correct in saying that if I define a CassandraCqlMapState with a parallelism of N, then N different instances of the Session object will be created? If this is correct, does this not go against what is recommended by DataStax: http://wwwdatastaxcom/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra where they say that a single Session object should be used across the application? Or does that not make sense in the storm context where the state could be distributed across different physical nodes?

2) For all queries, CassandraCqlStateMap prepares a BatchStatement to submit to Cassandra Is this guaranteed to have better performance than executing the individual statements asynchronously? Ref: https://lostechiescom/ryansvihla/2014/08/28/cassandra-batch-loading-without-the-batch-keyword/ Basically what I am trying to get to is: Does the micro-batching in Trident really offer any amortisation of costs given how cassandra behaves with batch queries?

Thanks Gireesh

— Reply to this email directly or view it on GitHub https://github.com/hmsonline/storm-cassandra-cql/issues/59.