wlloyd / eiger

A research fork of cassandra that provides causal+ consistency, read-only transaction, and write-only transaction across all the servers in each datacenter.
Apache License 2.0
77 stars 15 forks source link

Stress tool not working when a dc has multiple nodes #6

Closed chathurilanchana closed 8 years ago

chathurilanchana commented 9 years ago

Hi, I was trying to deploy Eiger in cluster. For this I used kodiak_dc_launcher.bash with necessary modifications. However when I ran it with 2 data centers and only 1 machine per data center, I was able to run the stress tool. To run the stress tool I used the test.sh script, specifying --nodes as list of nodes in the cluster. But when I have more than one machine per data center when I try to run the stress tool, operations failed and got an output similar to below.

0,0,0,0,0,NaN,10 0,0,0,0,0,NaN,20

Do I have to do some other configurations other than doing necessary changes in kodiac_dc_launcher.sh (attached run.sh), cassandra-topology.properties and the config file which has server details? With the same configurations generated by using the above scripts and only commenting sliced_buffer_size_in_kb: 64 , I was able to run stress tool with cassandra 1.1, even with multiple nodes per dc. So I think I'm missing something specific to eiger. Can you please help me to figure this out?

Thank you very much!

chathurilanchana commented 9 years ago

Furthermore, I found the below error in my log file. The assertion causing the error is added by Eiger and not available in cassandra only version.

ERROR [MutationStage:14] 2015-11-03 11:27:31,964 AbstractCassandraDaemon.java (line 137) Fatal exception in thread Thread[MutationStage:14,5,main] java.lang.AssertionError: Do not expect replication mutations from the localDC (yet) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:55) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

wlloyd commented 9 years ago

That error suggests that maybe you are trying to replicate data within a datacenter? The prototype only supports replicating data once inside each datacenter, and I believe this assertion is firing because a node is getting a replicated write from another node in the same datacenter.

If that's not your intention, I suspect that something might be weird with the configuration. Maybe the clients are configured to access both datacenters? (They should only access 1 at a time).

The experiment directory has the scripts I used to run the experiments, that might be a helpful place to look too.

chathurilanchana commented 9 years ago

Thank you for the reply.But I'm using replication factor as 1 and still getting the same error. When I have 1 data center with 2 clients I still get the error. So it seems like there is nothing wrong with client configurations as well.

wlloyd commented 9 years ago

Hmm, maybe the strategy-properties for the clients aren't set in the way Eiger expects them to be?

I'd recommend looking at https://github.com/wlloyd/eiger/blob/eiger-release/experiments/kodiak_scale.bash and https://github.com/wlloyd/eiger/blob/eiger-release/experiments/kodiak_common

to see the list of options that I had used for the stress tool when running experiments and how I configured certain things like the strategy-properties

chathurilanchana commented 8 years ago

Thank you! I was able to fix it. The problem was when using the stress tool, data needed to be insert through the InsertCL method. I was trying to insert with the default insert method.