Open KhafRuslan opened 3 months ago
Thanks for the suggestion! We're discussing this internally and will attempt some experiments. Updates will follow.
Thanks for the report and suggestion @KhafRuslan We are implementing some optimizations for this usecase. Updates will follow once ready to re-test!
@KhafRuslan please let us know if you can retest and confirm if the improvement is noticeable. Thanks!
We have encountered problems on a large amount of data. We tried to parse the query that Qryn makes and we had some questions SQL query:
The first part is fine, there is filtering by time:
In the second part, we encountered that it does a full database scan:
The third part is similar, analyzing a lot of data
Is it obligatory, is there no possibility to make binding also to time or other way of filtering ? Looks like an approach with multiple joins doesn’t work well on big amounts of data.
Denormalization and storing labels data in another format may help. There are some options:
1) Storing labels as Map(LowCardinality(String), String) at the schema otel.otel_logs in this article https://clickhouse.com/blog/storing-log-data-in-clickhouse-fluent-bit-vector-open-telemetry#querying-the-map-type
2) Look at section “Approach 3: JSON as pairwise arrays” here https://www.propeldata.com/blog/how-to-store-json-in-clickhouse-the-right-way This approach also use Signoz https://signoz.io/docs/userguide/logs_clickhouse_queries/