metrico / qryn

⭐️ All-in-One Polyglot Observability with OLAP Storage for Logs, Metrics, Traces & Profiles. Drop-in Grafana Cloud replacement compatible with Loki, Prometheus, Tempo, Pyroscope, Opentelemetry, Datadog and beyond :rocket:
https://qryn.dev
GNU Affero General Public License v3.0
1.24k stars 68 forks source link

Delay of ~20s before ingested log message is returned via web UI #598

Open Girgitt opened 1 week ago

Girgitt commented 1 week ago

Hi there, this is rather a question on possible fine-tuning rather than issue report.

observed behavior After log message is pushed e.g. via:

curl -X POST -H "Content-Type: application/json" -d '{"streams":[{"stream":{"job":"example-job","level":"info"},"values":[["'$(date +%s%N)'", "This is a log message with current timestamp: '$(date +%Y-%m-%dT%H:%M:%S%:z)'"]]}]}' http://127.0.0.1:3100/loki/api/v1/push

It takes about 20s to retrieve it in the web UI. Record related to such log entry appears in sampls_v3 table almost immediately Also label related to such log entry is avaiable in he web UI without noticeable delay; just the message itself is not available for some time.

expected behavior Once log gets into samples_v3 table it should be returned by the web UI without delay.

additional information qryn version: v3.2.36 qryn settings:

    PATH="/opt/qryn:%(ENV_PATH)s",
    CLICKHOUSE_SERVER="127.0.0.1",
    CLICKHOUSE_PORT=8123,
    CLICKHOUSE_PROTO="http",
    CLICKHOUSE_AUTH="dd_qryn:***",
    CLICKHOUSE_DB="edo_test_qryn",
    PORT=3100,
    BULK_MAXAGE=200

clickhouse version: 24.5.3.5 (official build)

acceptance criteria Maybe there is some caching going on in the web client ? If so - is it configurable to limit the delay?

Or maybe there are some operations done on clickhouse database before data is retrievable (earlier versions of qryn around v3.1 were using different schema with samples_v2 table and there were some periodic operations performed on the database - maybe in v.3.2 this process still exists and affects retrieval delay) ?

akvlad commented 1 week ago

How do you try to request it? Is it LIVE button or is it just a logQL request?

Do you have a single node clickhouse server or is it a cluster?

Girgitt commented 1 week ago

Thank you - that was it: Range vs. Instant. It is actually impossible to tell which mode is activated in the "light" theme but anyway:

  1. in "Instant" mode there is no delay
  2. in "Range" mode there is delay

I run single, local clickhouse instance with very little content (these are just initial tests during preparation to upgrade from v3.1.2 to v3.2.36)

:) select count(*) from edo_tes_qryn.samples_v3;

SELECT count(*)
FROM edo_test_qryn.samples_v3

Query id: 4584ce6d-f325-49da-8582-8b0b74ba08fc

   ┌─count()─┐
1. │    2329 │
   └─────────┘

1 row in set. Elapsed: 0.001 sec. 

The query from Web UI for "Last 30 minutes":

{job="example-job"}