quickwit-oss / quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
https://quickwit.io
Other
7.75k stars 310 forks source link

How to Do SubQuery Like SQL #5066

Open caibirdme opened 3 months ago

caibirdme commented 3 months ago

I'm implementing tempo's traceql in ltbridge, and I've meet a problem.

Here's a traceql example:

{foo.bar="1" && duration > 20ms} && {a.b="hello" && attributes.service.name="redis"}

Each curly brackets pair means searching for spans which must satisfy all the conditions inside the curly brackets. Just like select * from spans_table where foo.bar="1" and duration>20ms.

Logical Operator such as && || between curly brackets (eg: {A} && {B}), means Inner Join spans from A and B on the same trace_id. In human language, this search can be something like:

I want to search for traces in which it contains at least one span which satisfy A and contains another span which satisfy B and ...

The result should be the form of:

trace_id_a: # there're many spans in a trace, but just keep spans that matches one of the query conditions
  span_id_a,
  span_id_b,
trace_id_b:
  span_id_c,
  span_id_d
..

Here's a conversion from traceql to sql traceql:

{resource.app="camp" && serviceName="fooSvc"} && ({span.qwe="qqq"} || {foo>10})

sql:

    SELECT * FROM spans sp
    WHERE sp.span_id IN
        (SELECT span_id
        FROM (
                (SELECT span_id,
                        trace_id
                  FROM spans
                  WHERE (resource_attributes['app'] = 'camp'
                        AND service_name = 'fooSvc'))
              UNION
                (SELECT span_id,
                        trace_id
                  FROM spans
                  WHERE span_attributes['qwe'] = 'qqq')
              UNION
                (SELECT span_id,
                        trace_id
                  FROM spans
                  WHERE (span_attributes['foo'] > 10 OR resource_attributes['foo']>10))) AS sub
        WHERE (sub.trace_id IN
                      (SELECT trace_id
                        FROM spans
                        WHERE (resource_attributes['app'] = 'camp'
                              AND service_name = 'fooSvc'))
                    AND (sub.trace_id IN
                            (SELECT trace_id
                            FROM spans
                            WHERE span_attributes['qwe'] = 'qqq')
                          OR sub.trace_id IN
                            (SELECT trace_id
                            FROM spans
                            WHERE (span_attributes['foo'] > 10 OR resource_attributes['foo']>10)))))

I don't know the equivalent quickwit or elasticsearch query, and need your help

fulmicoton commented 3 months ago

@caibirdme Thank you for the great report. It is currently impossible.

Our trace endpoint only support a single clause. (so {foo.bar="1" && duration > 20ms} is supported for instance)

Let's keep this issue open and see if we can come up with ways to fix this.

caibirdme commented 3 months ago

Got. Actually I can split this big query into many small queries, and join them in memory, but this can easily lead to OOM for the ltbridge server. Like {service="redis"} && {duration > 500ms}, if I query for all spans which is a redis call, it may return thousands of millions rows. And query for span whose duration is larger than 500ms, also many many rows. And then OOM...

fulmicoton commented 3 months ago

Yes that's pretty terrible. The query in itself is difficult not simple with our data model, but doing it on quickwit side should be much lighter. It will require work however.