grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.51k stars 3.4k forks source link

feat: Drain uses different tokenizer based on log format #13384

Closed cyriltovena closed 3 months ago

cyriltovena commented 3 months ago

What this PR does / why we need it:

This replace the tokenizer with special one depending on the log format. It also discard json logs.

I also improve performance by removing most of allocations

benchstat before.txt after.txt
name                                                            old time/op    new time/op    delta
Drain_TrainExtractsPatterns/testdata/agent-logfmt.txt-16          1.71ms ± 0%    0.84ms ± 2%   -51.08%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/ingester-logfmt.txt-16        123µs ± 3%      57µs ± 4%   -53.82%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/drone-json.txt-16             302µs ±24%     172µs ± 6%   -43.19%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/distributor-logfmt.txt-16    5.87ms ± 1%    3.01ms ±18%   -48.80%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/journald.txt-16              2.63ms ± 4%    1.85ms ± 3%   -29.62%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/kafka.txt-16                 1.85ms ± 6%    1.03ms ± 2%   -44.42%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/kubernetes.txt-16            2.29ms ± 3%    1.40ms ± 2%   -38.93%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/vault.txt-16                 1.89ms ± 9%    1.11ms ± 9%   -40.96%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/calico.txt-16                3.02ms ±28%    1.48ms ± 3%   -51.13%  (p=0.008 n=5+5)

name                                                            old alloc/op   new alloc/op   delta
Drain_TrainExtractsPatterns/testdata/agent-logfmt.txt-16          1.35MB ± 0%    0.03MB ± 0%   -97.96%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/ingester-logfmt.txt-16       96.7kB ± 0%     0.0kB ± 0%  -100.00%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/drone-json.txt-16             545kB ± 0%       5kB ± 0%   -99.07%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/distributor-logfmt.txt-16    4.80MB ± 0%    0.00MB ± 8%  -100.00%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/journald.txt-16              3.19MB ± 0%    0.03MB ± 0%   -99.18%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/kafka.txt-16                 2.98MB ± 0%    0.02MB ± 0%   -99.19%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/kubernetes.txt-16            3.17MB ± 0%    0.02MB ± 0%   -99.22%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/vault.txt-16                 2.87MB ± 0%    0.02MB ± 0%   -99.16%  (p=0.016 n=5+4)
Drain_TrainExtractsPatterns/testdata/calico.txt-16                3.16MB ± 0%    0.03MB ± 0%   -99.20%  (p=0.008 n=5+5)

name                                                            old allocs/op  new allocs/op  delta
Drain_TrainExtractsPatterns/testdata/agent-logfmt.txt-16           20.0k ± 0%      0.1k ± 0%   -99.42%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/ingester-logfmt.txt-16        1.60k ± 0%     0.00k       -100.00%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/drone-json.txt-16               660 ± 0%       210 ± 0%   -68.18%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/distributor-logfmt.txt-16     80.0k ± 0%      0.0k       -100.00%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/journald.txt-16               3.96k ± 0%     1.01k ± 0%   -74.47%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/kafka.txt-16                  3.99k ± 0%     1.00k ± 0%   -74.96%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/kubernetes.txt-16             4.00k ± 0%     1.00k ± 0%   -74.91%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/vault.txt-16                  4.00k ± 0%     1.00k ± 0%   -75.00%  (p=0.008 n=5+5)
Drain_TrainExtractsPatterns/testdata/calico.txt-16                 4.04k ± 0%     1.02k ± 0%   -74.76%  (p=0.008 n=5+5)

Which issue(s) this PR fixes: Fixes #

Special notes for your reviewer:

Checklist

cyriltovena commented 3 months ago

Fixes https://github.com/grafana/loki-private/issues/1014