loki queries with wide time ranges tend to miss logs

freehck commented 1 year ago

Describe the bug If a query has a wide time range some log lines can be missed. Manual split of queries to smaller time ranges eliminates the problem. I think it's a problem of Loki Querier.

To Reproduce I managed to reproduce the problem only on a heavy loaded production infrastructure. Here will be a lot of bash code, so take a deep breath.

First of all let's define an additional function to simplify a bit the following code reading:

mylogcli(){
  logcli query --quiet --forward --limit=0 --org-id="ORG_ID"--addr=http://LOKI_ADDR "$@"
}

Now here's a bunch of queries taking the log for one specific day (24h): 1) direct way to query a full day log in one pass:

mylogcli '{pod="POD_NAME"}' --from="2023-09-01T00:00:00+03:00" --to="2023-09-02T00:00:00+03:00" -o raw > full-day.log

2) manual split to query the same range by hours: i.e. 24 queries for the same specific day (24h):

:>full-day-hours.log;
for i in $(seq 0 23); do
  ii=$(printf '%02d' "$i");
  if [ "$i" -ne 23 ]; then
    ie=$((i+1)); d=1;
  else
    ie=0; d=2;
  fi
  iie=$(printf '%02d' "$ie");
  mylogcli '{pod="POD_NAME"}' --from="2023-09-01T${ii}:00:00+03:00" --to="2023-09-0${d}T${iie}:00:00+03:00" -o raw >>full-day-hours.log
  echo "HOUR $ii rc $?"
done

3) manual split to query the same range by smaller ranges of 10 minutes only: i.e. 24*6 queries for the same specific day (24h):

:>full-day-10mins.log;
d=1
for i in $(seq 0 23); do
  ii=$(printf '%02d' "$i");
  for j in $(seq 0 10 50); do
    ie=$i; je=$((j+10)); de=$d;
    if [ "$j" -eq "50" ]; then
      ie=$((i+1)); je=0;
      if [ "$i" -eq 23 ]; then
        de=$((d+1)); ie=0;
      fi
    fi
    jj=$(printf '%02d' "$j");
    iie=$(printf '%02d' "$ie");
    jje=$(printf '%02d' "$je");
    mylogcli '{pod="POD_NAME"}' --from="2023-09-0${d}T${ii}:${jj}:00+03:00" --to="2023-09-0${de}T${iie}:${jje}:00+03:00" -o raw >>full-day-10mins.log;
    echo "$ii:$jj rc $?";
  done
done

Here're the results for these three commands:

# md5sum *
d39a5829271d38a2c819459e40bf5897  full-day-10mins.log
d39a5829271d38a2c819459e40bf5897  full-day-hours.log
50f63231216ca127c7798a8e82bedae7  full-day.log
# ls -l
total 1199412
-rw-r--r--    1 root     root     409469805 Sep 19 20:02 full-day-10mins.log
-rw-r--r--    1 root     root     409469805 Sep 19 20:04 full-day-hours.log
-rw-r--r--    1 root     root     409246066 Sep 19 18:39 full-day.log
# echo "$(echo 'scale=2; (409469805-409246066)/1024' | bc)kb of missed logs";
218.49kb of missed logs
# wc -l *| grep -v total
  1238558 full-day-10mins.log
  1238558 full-day-hours.log
  1237879 full-day.log
# echo "$(echo '1238558-1237879' | bc) lines of missed logs"
679 lines of missed logs

As you can see, the log full-day.log returned by the first query has 679 lines (or 218.49kb) less than full-day-hours.log or full-day-10mins.log returned by split (second and third) queries . It's about 0.055% of the full amount of log, but for B2B purposes even such a small percent of missed logs is a problem.

A visual analysis of the differences of the collected logs shows that the missed lines are not in the beginning or in the end, but everywhere. F.e. the lines I was looking for were actually in the middle of the day.

Expected behavior I expect a direct query of a full time range to return all the log lines, but in fact I see the difference between full-day.log and full-day-hours.log.

Environment: We use ha-loki installation in Kubernetes with Minio backend.

80-loki.tf

resource "helm_release" "loki" {
  name         = "loki"
  namespace    = "loki"
  repository   = "https://grafana.github.io/helm-charts"
  chart        = "loki"
  version      = "5.8.9"
  values = [
    "${templatefile("../../../files/helm-loki-values.yaml", local.cfg)}"
  ]
}

helm-loki-values.yaml

loki:
  auth_enabled: false

global:
  dnsService: "coredns"

loki:
  storage:
    type: s3
    s3:
      endpoint: MINIO_ADDR
      accessKeyId: AKEY
      secretAccessKey: SKEY
      insecure: true
      s3ForcePathStyle: true
    bucketNames:
      admin: ha-loki-admin-bucket
      chunks: ha-loki-chunks-bucket
      ruler: ha-loki-ruler-bucket

backend:
  nodeSelector:
    node-pool: master

read:
  replicas: 4
  autoscaling:
    enabled: true
    minReplicas: 4
    maxReplicas: 6
    targetCPUUtilizationPercentage: 60

write:
  replicas: 6
  autoscaling:
    enabled: true
    minReplicas: 6
    maxReplicas: 8

gateway:
  replicas: 3
  autoscaling:
    enabled: true
    minReplicas: 3
    maxReplicas: 3
    targetCPUUtilizationPercentage: 60
  nodeSelector:
    LOKI_NODE_POOL_LABEL: LOKI_NODE_POOL_VALUE
  service:
    port: 80
    type: NodePort
    nodePort: SOME_PORT

freehck commented 1 year ago

Workaround

As splitting to small time ranges eliminates the problem, I wrote a bash workaround using the Monte Carlo method.

############### BEGIN OF MONTE CARLO METHOD #################

# time functions

export TZ=UTC

function date2seconds() {
  date -d"$1" +%s
}

function seconds2iso8601date() {
  date -Iseconds -d@"$1"
}

# checksum calculation
function md5sum_hashonly() {
  md5sum "$1" | cut -d' ' -f1
}

# logcli function

function logcli-wrapper() {
  logcli query --quiet --forward --limit=0 --org-id=ORG_ID --addr=LOKI_ADDR -o raw "$@"
}

function logcli-monte-carlo-rec() {
  #### PREAMBULA ####
  # input parameters
  local query="$1" from_ts="$2" to_ts="$3" logfile="$4" ranges="${5-24}" checksum="${6-}"
  # defaults
  local convergence_coefficient=6
  # calculations
  local range_length_seconds=$(( to_ts - from_ts ))
  local range_interval=$(( range_length_seconds / ranges ))
  # temp vars
  local range_from_ts range_to_ts range_from_date range_to_date
  local temp_log_file new_checksum

  #### CODE ####
  # make temp file to store log
  temp_log_file=$(mktemp)
  trap "rm -f $temp_log_file" err int exit

  # calculate ranges for chunks
  for range_from_ts in $(seq $from_ts $range_interval $to_ts); do
    if [ "$range_from_ts" -ge "$to_ts" ]; then break; fi
    range_to_ts=$(( range_from_ts + range_interval ))
    if [ "$range_to_ts" -gt "$to_ts" ]; then range_to_ts=$to_ts; fi
    range_from_date=$(seconds2iso8601date $range_from_ts)
    range_to_date=$(seconds2iso8601date $range_to_ts)

    # download logs by chunks
    logcli-wrapper "$query" --from="$range_from_date" --to="$range_to_date" >>$temp_log_file
  done

  # compare checksums
  new_checksum=$(md5sum_hashonly $temp_log_file)
  if [ "$new_checksum" = "$checksum" ]; then
    # move to the appropriate place when success
    mv "$temp_log_file" "$logfile"
  else
    # remove garbage and start a new iteration when failure
    rm -f "$temp_log_file"
    logcli-monte-carlo-rec "$query" "$from_ts" "$to_ts" "$logfile" "$(( ranges * convergence_coefficient ))" "$new_checksum"
  fi
}

function logcli-monte-carlo() {
  local query="$1" from_date="$2" to_date="$3" logfile="$4" ranges="${5-24}"
  logcli-monte-carlo-rec "$query" "$(date2seconds $from_date)" "$(date2seconds $to_date)" "$logfile" "$ranges"
}

############### END OF MONTE CARLO METHOD #################

Usage:

export TZ=UTC
logcli-monte-carlo <QUERY> <ISODATE_FROM> <ISODATE_TO> <LOGFILE> [<HOW_MANY_RANGES>]

Example:

export TZ=UTC
logcli-monte-carlo '{pod="POD_NAME"}' "2023-09-01T00:00:00" "2023-09-02T00:00:00" "log"

This code is well-tested for 1-day queries. By default it splits the range to 24 smaller ranges, every next iteration multiplies the number of ranges by 6, then check if the checksum of the collected log's not changed, and if it's not, save the log file in the specified place.

NB: 1) Code doesn't cotnain logcli's non-zero return codes checks. Run it strictly with set -e. 2) You must have tzdata package installed. Also explicit export of TZ variable is strongly recommended. 3) If you query a time range wider than 1 day, or if your query labels must return a larger amount of data than 350mb (my case) -- increase the number of ranges to split in advance (using the 5th parameter of logcli-monte-carlo). 4) Never set the number of ranges to small numbers. Monte Carlo method works well only on big numbers.

dwt commented 9 months ago

I think I see the same problem. When I query a 7 day window, I see one log line, when I split it up into 3 2 day windows, then I see two log lines.

I am on version:

$ /usr/bin/loki --version
loki, version 2.8.5 (branch: HEAD, revision: 03cd6c82b)
  build user:       root@bdb1fa196fd7
  build date:       2023-09-14T17:17:19Z
  go version:       go1.20.7
  platform:         linux/amd64

romanvilu commented 4 months ago

Experiencing the same problem.

freehck commented 3 months ago

Big important update. This doesn't happen with the following loki configuration:

loki:
  limits_config:
    split_queries_by_interval: 0

Probably it'd be a sane default for loki until this bug's fixed.

grafana / loki

loki queries with wide time ranges tend to miss logs #10658