Closed probably-not closed 8 months ago
Before we can benefit from dedup, we need to add commit propagation for clients (see somewhat related roadmap item: https://github.com/questdb/roadmap/issues/52). Without this, the client won't be able to tell which lines are safe to dispose of and which should be kept around to be resent. That's certainly something that we'll be working on in the future and once that's done we'll be able to expose client configuration for safer ingestion.
@puzpuzpuz I agree that commit/error propagation is important, but adding a reconnect+resend of the current batch is something that can be added regardless of commit/error propagation. If there's a TCP write error or a broken pipe, the entire batch is lost on the client side, which is a pretty critical thing to lose, and it doesn't relate to propagating errors from the actual write itself, just to making sure that we don't lose data when there is a TCP error.
This feature can be added now, so that clients don't lose data due to TCP issues (regardless of database write issues).
I got your point. Yes, such change is certainly possible with a few changes in the way we deal with the buffer. I discussed this with the team and we'll be making this change, but not in the near future. In the meanwhile, we're certainly open to contributions.
Closing this one since v3 shipped HTTP sender which allows explicit control over transactions and has automatic retry behavior.
There are several issues and pull requests throughout this repo that discuss the fact that the LineSender is not fully safe, i.e. there's issues in auto-reconnection, there's no way to retrieve the failed sent buffer to retry if there is a connection issue, etc.
@puzpuzpuz has said that this can't be implemented without data deduplication on the server side.
According to the QuestDB Docs site, data deduplication is implemented on the DB since QuestDB 7.3: https://questdb.io/docs/concept/deduplication/
Can we get this MAJOR ISSUE fixed (at the very least as an option on the client that we can enable) so we can have some better safety in what will happen if there's a connection issue, so we don't lose data and we don't have to implement reconnection logic on our own?