jalkanen / cassandracalculator

A simple Cassandra Consistency Calculator
Apache License 2.0
81 stars 37 forks source link

Incorrectly calculated number of nodes without data loss #5

Closed vagifzeynalov closed 7 years ago

vagifzeynalov commented 7 years ago

Hi!

It seems like the nodes without data loss calculation is based on the write level only. For example, I have the cluster size 4, and the replication 4 as well, but write level is ONE. So although "Each node holds 100% of your data.", the calculator shows me "You can survive the loss of no nodes without data loss.". Which is not true. Even the write level is ONE, Cassandra will automatically replicate the data across all nodes.

AFAIK the write level affects only the client - how fast Cassandra will acknowledge the data was saved.

Regards, Vagif

vagifzeynalov commented 7 years ago

Yeah, var dataloss = w - 1; - not related to the ratio at all https://github.com/jalkanen/cassandracalculator/blob/master/index.html#L138

preli commented 7 years ago

I think this might be on purpose. When write level is ONE Cassandra will report "success" after the data has been sent to one node. If this node fails before it can "send" the new data to other nodes ->you've just lost data.

jalkanen commented 7 years ago

@preli is correct. Data loss can occur before all replicas are updated, if all the nodes which acknowledged the write go down (or netsplit) simultaneously. It's not just about losing data which has already been replicated.