sutoiku / puffin

Serverless HTAP cloud data platform powered by Arrow × DuckDB × Iceberg
http://PuffinDB.io
MIT License
303 stars 11 forks source link

How query logs should be queued for batch INSERT INTO? #1

Closed ghalimi closed 1 year ago

ghalimi commented 1 year ago

Every query must be logged into an Iceberg table using an INSERT INTO query. Batching multiple such queries into one would make it more efficient, but would require some queuing mechanism. Since low latency is not an absolute requirement for query logs, Amazon SQS could be used for such a purpose, but should other options be considered?

ghalimi commented 1 year ago

Using an Iceberg table for logs is probably adding more complexity than is necessary. Amazon ElastiCache for Redis is probably a better option.

alexey-milovidov commented 1 year ago

I don't think it's the best approach. For example, ClickHouse logs the queries directly into ClickHouse (itself) into the system.query_log table. That's how they can be analyzed with low latency.

ghalimi commented 1 year ago

We're actually going in that direction now.