jaegertracing / jaeger-clickhouse

Jaeger ClickHouse storage plugin implementation
Apache License 2.0
236 stars 50 forks source link

Durable database writes #75

Closed x4m closed 2 years ago

x4m commented 2 years ago

Hi! Thanks for the project, I believe it's of a great value to the community.

Currently, this plugin accumulated data and writes it to the database. I think it's important to do several things to ensure more durable writes:

  1. Retry network and database failures. Use exponential backoff in a case when the database cannot server write immediately.
  2. Buffer data not written to DB. Ensure that the buffer does not overflow. Sacrifice data intentionally if it cannot be stored in DB.
  3. Reload connection string when requested: a user can add new shards to CH installation

What do you think?

pavolloffay commented 2 years ago

Can any of these by hadnled by ch sql driver (for instance retries)? Maybe there is some middleware for SQL client that provides these capabilities.

  1. Reload connection string when requested: a user can add new shards to CH installation

Can you please explain this further? Do you mean call sql.Open when the user requests it?

x4m commented 2 years ago

I do not know if drivers do it. Can you please explain this further? Do you mean call sql.Open when the user requests it?

It's important to handle cases when users change the topology of the CH cluster. Adds or removes nodes. sql.Open() is part of the solution, yes. BTW it's important to reconnect when database was restarted.

pavolloffay commented 2 years ago

It's important to handle cases when users change the topology of the CH cluster. Adds or removes nodes. sql.Open() is part of the solution, yes. BTW it's important to reconnect when database was restarted.

Let's say the DB plugin writes to Distributed table engine and the topology changes (e.g. added new node/shard). Is it mandatory to reconnect to DB with sql.Open? I didn't experience this when working on https://github.com/jaegertracing/jaeger-clickhouse/blob/main/guide-sharding-and-replication.md.

x4m commented 2 years ago

No, it's OK to connect to the remaining nodes. But, I think, eventually you need to update connection information.

pavolloffay commented 2 years ago

Also the storage plugin is connected to the loadbalancer and not to each individual node.

Perhaps we should mention this in our docs. @EinKrebs would you like to document this in the config?

x4m commented 2 years ago

I think in CH each individual node is the load balancer. And you can specify this in the connection string of the driver. But it's better to ask some CH experts here... I'll do that.