jsevellec / cassandra-unit

Utility tool to load Data into Cassandra to help you writing good isolated JUnit Test into your application
GNU Lesser General Public License v3.0
424 stars 0 forks source link

Speed Issues When Using Driver 2.1.6+ #164

Closed jwcarman closed 7 years ago

jwcarman commented 8 years ago

First of all, thanks for all your hard work on this library! This really saves us a ton of time!

I'm working on a library which uses cassandra-unit for testing and we're trying to upgrade the Cassandra driver version we're using. From what I can tell, it seems that there is a very noticeable speed degradation when you go from driver version 2.1.5 to 2.1.6 and above. I even start to see test failures ("All host(s) tried for query failed") near the end of the test run. Did you run into this? I'm using these versions:

<cassandra.version>2.1.9</cassandra.version>
<cassandra.driver.version>2.1.6</cassandra.driver.version>
<cassandra.unit.version>2.1.9.2</cassandra.unit.version>
ilinas commented 8 years ago

My experience is that driver 2.1.7 performs much better than 2.1.8 and 2.1.9. It seems that CQL data loading is much slower with the new driver, and especially schema update queries. I am not observing any slowness in production though, just unit testing.

In numbers, my unit tests run in: 40 sec with 2.1.7 2 min 15 sec with 2.1.9

OrangeDog commented 8 years ago

Some of my results for timing cleanEmbeddedCassandra.

Other versions are not compatible with my spring-data-cassandra version, so I cannot test easily.

Profiling with VisualVM suggests the time is spent in Cassandra itself rather than the driver.

org.cassandraunit.utils.EmbeddedCassandraServerHelper.cleanEmbeddedCassandra() 0.0  0.000 ms (0%)   0.000 ms    4,696 ms    0.000 ms
io.netty.util.HashedWheelTimer$Worker.waitForNextTick() 3.7867706   4,696 ms (3.8%) 0.000 ms    4,696 ms    0.000 ms
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()  0.0 0.000 ms (0%)   0.000 ms    4,408 ms    4,408 ms
olim7t commented 8 years ago

@lzvtrifork slow schema updates could be caused by event debouncing (introduced by JAVA-657): before refreshing the driver schema metadata, we wait for a while (1s by default) in case concurrent updates happen, so that we can coalesce them in a single refresh query.

But the downside is that each DDL query now takes at least 1s, not the best for a unit test where a single client does a lot of them. You can try to disable debouncing this way:

Cluster.builder().withQueryOptions(
    new QueryOptions().setRefreshSchemaIntervalMillis(0))
OrangeDog commented 8 years ago

@olim7t that seems to have improved things.

Cassandra-unit should therefore make sure to set that for all DDL operations, such as dropKeyspaces.

Setting the node and nodelist intervals also speeds up connections.

loucasa commented 8 years ago

I'm running into this too, I tried setting the refresh intervals mentioned above and it does make a little difference running a single test class but is still much slower than using 2.1.6

OrangeDog commented 7 years ago

Setting the node and nodelist intervals also speeds up connections.

@jsevellec any reason you didn't?

jsevellec commented 7 years ago

I missed it and didn't get what you mean. Could you be more precise?

OrangeDog commented 7 years ago

@jsevellec these also make it go faster:

.setRefreshNodeListIntervalMillis(0)
.setRefreshNodeIntervalMillis(0)
jwcarman commented 7 years ago

Thanks @jsevellec, I'll give it a spin!