kairosdb / kairosdb-client

Java Client for KairosDB
65 stars 67 forks source link

KairosDB server does not notify the client when Cassandra connection is down #29

Closed lauraureche closed 9 years ago

lauraureche commented 10 years ago

Version: kairosdb_0.9.4-6 OS: Debian Wheezy 64bit

I used KairosDB with Cassandra database and I have a client which sends data to KairosDB server. After stopping Cassandra(service is down), an error is received in KairosDB server, but the client is not notified that the send data could not be stored.

This behaviour cause data loss when the send data is deleted/dropped.

The issue can be reproduce by setting the configuration and sending some POST requests to the KairosDB server.

salex89 commented 9 years ago

I would like to know the status of this issue. I'm having similar issues with the same version of kairosdb and Ubuntu 14.04. The server fails writing since it is a massive write (I'm doing some benchmarking to see if I could use kairosdb), and the write fails which is stated in the console, but the client reports nothing.

jsabin commented 9 years ago

This would require a change to KairosDB. The data is processed on a different thread so there is no way the client could know of success or failure.

But I'm interested in your use case. Why would all of your Cassandra nodes be brought down? Cassandra is designed with replication so even if one node goes down you should still be able to write data to other nodes guaranteeing that data is written. Help me understand how you are using KairosDB and Cassandra.

salex89 commented 9 years ago

Hi Jeff. The use case is not all that special, it's just some sensory readings with 250Hz frequency (so 4ms between data points). We first seen this with a rather low powered 3-node cluster. When we wanted to insert a couple of hours of data, the cluster got overloaded and started reporting errors. We have seen this behavior with native Cassandra, when we load a larger amount of regular data. So, we might suspect this happens if the usage overgrows the cluster size. Even if it fails, we should at least know it, not assume it just worked. The cluster may fail for some other reason during the write, and we would not know. I hope you understand my concern.

Currently we are mitigating the risk by introducing a worker in front of KairosDB and with some safe batch size/write delay settings. The worker reads the data from a broker/queue (which is being filled by a web service) and then loads the data into KairosDB. It's some sort of load balancing, also. But nevertheless, i feel a bit concerned. Especially because I've seen one big write request with "optimistic" batch/delay settings can overload the cluster.

I've also started this issue in KairosDB project, and the author is aware. He also states that even if the write fails first time, it will be retried later, but I don't think it happens like that.

Issue in KariosDB project: https://github.com/kairosdb/kairosdb/issues/145

On Wed, May 27, 2015 at 12:33 AM Jeff Sabin notifications@github.com wrote:

This would require a change to KairosDB. The data is processed on a different thread so there is no way the client could know of success or failure.

But I'm interested in your use case. Why would all of your Cassandra nodes be brought down? Cassandra is designed with replication so even if one node goes down you should still be able to write data to other nodes guaranteeing that data is written. Help me understand how you are using KairosDB and Cassandra.

— Reply to this email directly or view it on GitHub https://github.com/kairosdb/kairosdb-client/issues/29#issuecomment-105688598 .

jsabin commented 9 years ago

Closing this issue. Issue 145 that you opened in the KairosDB project addresses this.