Closed steveannett closed 1 year ago
Hey @steveannett As you correctly stated, use_boltdb_shipper_as_backup
should default to false
and therefore not require any boltdb_shipper
storage config.
I quickly tested -target=read
with this config:
schema_config:
configs:
- from: 2022-02-08
schema: v12
store: tsdb
object_store: filesystem
index:
prefix: index_tsdb_
period: 24h
storage_config:
tsdb_shipper:
active_index_directory: /tmp/loki/index
cache_location: /tmp/loki/cache
shared_store: filesystem
However, could not reproduce the issue.
Could you please also post the contents of your generated config.yaml
(ConfigMap)?
Hi @chaudum , thanks for looking into this - here is generated config.yaml
apiVersion: v1
data:
config.yaml: |
auth_enabled: false
common:
compactor_address: 'loki-read'
path_prefix: /var/loki
replication_factor: 1
storage:
s3:
bucketnames: ${LOKI_S3_BUCKET}
insecure: false
region: ap-east-1
s3: s3://ap-east-1/${LOKI_S3_BUCKET}
s3forcepathstyle: true
compactor:
shared_store: s3
working_directory: /var/loki/compactor
ingester:
chunk_idle_period: 3m
chunk_retain_period: 1m
limits_config:
enforce_metric_name: false
max_cache_freshness_per_query: 10m
reject_old_samples: true
reject_old_samples_max_age: 168h
split_queries_by_interval: 15m
memberlist:
join_members:
- loki-memberlist
querier:
engine:
timeout: 5m
max_concurrent: 16
query_timeout: 5m
query_range:
align_queries_with_step: true
query_scheduler:
max_outstanding_requests_per_tenant: 32768
ruler:
alertmanager_url: http://_http-web._tcp.alertmanager-operated.monitoring.svc.cluster.local:9093
enable_alertmanager_v2: true
enable_api: true
ring:
kvstore:
store: inmemory
rule_path: /var/loki/rules-temp
storage:
local:
directory: /var/loki/rules
type: local
runtime_config:
file: /etc/loki/runtime-config/runtime-config.yaml
schema_config:
configs:
- from: "2023-01-01"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v12
store: tsdb
server:
grpc_listen_port: 9095
http_listen_port: 3100
storage_config:
hedging:
at: 250ms
max_per_second: 20
up_to: 3
tsdb_shipper:
active_index_directory: /var/loki/tsdb-index
cache_location: /var/loki/tsdb-cache
cache_ttl: 24h
shared_store: s3
table_manager:
retention_deletes_enabled: false
retention_period: 0
kind: ConfigMap
metadata:
labels:
app.kubernetes.io/instance: loki-instance
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: loki
app.kubernetes.io/version: 2.7.3
helm.sh/chart: loki-4.8.0
name: loki
namespace: monitoring
This generated using the following kustomization.yaml
:
namespace: monitoring
helmCharts:
# Loki Logging (see https://github.com/grafana/loki/tree/main/production/helm/loki also very useful guide at https://rtfm.co.ua/en/grafana-loki-architecture-and-running-in-kubernetes-with-aws-s3-storage-and-boltdb-shipper/)
- name: loki
version: 5.10.0 # App version 2.8.3
namespace: monitoring
releaseName: loki-instance
repo: https://grafana.github.io/helm-charts
valuesInline:
loki:
enabled: true
auth_enabled: false
isDefault: false # So it doesn't override Prometheus
commonConfig:
path_prefix: /var/loki
replication_factor: 1
storage:
bucketNames:
chunks: ${LOKI_S3_BUCKET}
type: s3
s3:
s3: s3://ap-east-1/${LOKI_S3_BUCKET}
region: ap-east-1
insecure: false
s3ForcePathStyle: true
sse_encryption: true
sse:
type: "SSE-S3"
storage_config:
tsdb_shipper:
active_index_directory: /var/loki/tsdb-index
shared_store: s3
cache_location: /var/loki/tsdb-cache
cache_ttl: 24h
schemaConfig:
configs:
- from: "2023-01-01"
store: tsdb
object_store: s3
schema: v12
index:
prefix: loki_index_
period: 24h
rulerConfig:
storage:
type: local
local:
directory: /var/loki/rules
rule_path: "/var/loki/rules-temp"
ring:
kvstore:
store: inmemory
alertmanager_url: http://_http-web._tcp.alertmanager-operated.monitoring.svc.cluster.local:9093
enable_alertmanager_v2: true
enable_api: true
compactor:
working_directory: /var/loki/compactor
shared_store: s3
index_gateway:
mode: simple
query_scheduler:
# TSDB sends more requests, so increase the pending request queue sizes (https://grafana.com/docs/loki/latest/operations/storage/tsdb/)
max_outstanding_requests_per_tenant: 32768
querier:
# Each `querier` component process runs a number of parallel workers to process queries simultaneously.
# but we find the most success running at around `16` with tsdb (https://grafana.com/docs/loki/latest/operations/storage/tsdb/)
max_concurrent: 16
engine:
timeout: 5m
query_timeout: 5m
ingester:
# Flush chunks that don't receive new data
chunk_idle_period: 3m
# Keep flushed chunks in memory for a duration
chunk_retain_period: 1m
monitoring:
dashboards: # Grafana Dashboards
enabled: true
rules:
enabled: true
alerting: true
serviceMonitor: # For alerts etc
enabled: true
selfMonitoring:
enabled: false
grafanaAgent:
installOperator: false
lokiCanary:
enabled: false
test:
enabled: false
write:
replicas: 2
extraArgs:
- -config.expand-env=true
extraEnvFrom:
- configMapRef:
name: loki-s3-storage
resources:
limits:
memory: 1.5Gi
requests:
memory: 1.5Gi
cpu: "0.1"
read:
replicas: 1
extraArgs:
- -config.expand-env=true
extraEnvFrom:
- configMapRef:
name: loki-s3-storage
resources:
limits:
memory: 3Gi
requests:
memory: 3Gi
cpu: "0.5"
backend:
replicas: 1
extraVolumeMounts:
- name: rules
mountPath: "/var/loki/rules/fake"
extraVolumes:
- name: rules
configMap:
name: loki-alerting-rules
memberlist:
service:
publishNotReadyAddresses: false
gateway:
replicas: 1
# This service account is already created by eksctl
serviceAccount:
create: false
name: loki-sa
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::00000000000:role/loki_s3_role"
Thanks @steveannett I tried the resulting config with Loki 2.8.2 (outside of Kubernetes) and I could not reproduce the error either. So I thought there might be an issue with Helm.
However, I also tested your values (taken from kustomization.yaml
) and installed the chart with the following command:
helm --kube-context k3d upgrade loki grafana/loki --namespace lokitest --version 5.5.12 --values values.yaml
The read-loki
pod starts up correctly. So I am a bit clueless what the problem could be.
Any chance you could try to upgrade to a later Helm chart version?
I'm running Loki v2.8.2, using Helm Chart version 5.5.12, deploying on to Kubernetes via the Kustomize tool
Describe the bug In the configuration when using
tsdb_shipper
without the valuetsdb_shipper.use_boltdb_shipper_as_backup
set as false, or aboltdb_shipper
configuration as backup theloki-read
pods fail with the following error:To Reproduce Steps to reproduce the behavior:
Use the following config:
Expected behavior
In the documentation at https://grafana.com/docs/loki/latest/configuration/ it states that
use_boltdb_shipper_as_backup
is false, so the expected behavior is that everything would start correctly. However it won't start unless the valuetsdb_shipper.use_boltdb_shipper_as_backup
is set as false, or there has been aboltdb_shipper
configuration added.Environment: Loki v2.8.2, using Helm Chart version 5.5.12, deploying on to EKS Kubernetes 1.23 via the Kustomize tool
Workaround
Either add boltdb configuration to the
storage_config
andschemaConfig
or settsdb_shipper.use_boltdb_shipper_as_backup
to false. This will allow theloki-read
pods to run correctly.