Open awoimbee opened 6 months ago
Closing since there have been some releases since, if it still happens I'll reopen
Happend for me today while deploying a simple scalable loki 3.0.0 only on backend pod
The same problem only the difference i have 3 pods 2 are ok 1 - CrashLoopBack
k8 logs -n observability loki-backend-1 -c loki panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x288 pc=0x22f02b0]
goroutine 1 [running]: github.com/grafana/loki/v3/pkg/loki.(Loki).updateConfigForShipperStore(0xc0006d5ea0?) /src/loki/pkg/loki/modules.go:755 +0xb0 github.com/grafana/loki/v3/pkg/loki.(Loki).initBloomStore(0xc000cab500) /src/loki/pkg/loki/modules.go:715 +0x68 github.com/grafana/dskit/modules.(Manager).initModule(0xc0004f2f90, {0x7fffb01fda84, 0x7}, 0x1?, 0xc00096e1e0?) /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:136 +0x1f7 github.com/grafana/dskit/modules.(Manager).InitModuleServices(0x0?, {0xc00097ca80, 0x1, 0xc0005a9b30?}) /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108 +0xd8 github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc000cab500, {0x0?, {0x4?, 0x3?, 0x4912940?}}) /src/loki/pkg/loki/loki.go:453 +0x9d main.main() /src/loki/cmd/loki/main.go:122 +0x113b
@alexandergoncharovaspecta Can you provide your config?
@alexandergoncharovaspecta Can you provide your config?
I am able to reproduce the bug on the release-3.0.x
branch using
$ ./cmd/loki/loki -target=backend -index-gateway.mode=ring
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x288 pc=0x22efff0]
goroutine 1 [running]:
github.com/grafana/loki/v3/pkg/loki.(*Loki).updateConfigForShipperStore(0xc0008b8960?)
/home/christian/sandbox/grafana/loki/pkg/loki/modules.go:755 +0xb0
github.com/grafana/loki/v3/pkg/loki.(*Loki).initBloomStore(0xc0007c9500)
/home/christian/sandbox/grafana/loki/pkg/loki/modules.go:715 +0x68
github.com/grafana/dskit/modules.(*Manager).initModule(0xc00063c780, {0x7fffab192a32, 0x7}, 0x1?, 0xc000eb8d20?)
/home/christian/sandbox/grafana/loki/vendor/github.com/grafana/dskit/modules/modules.go:136 +0x1f7
github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x0?, {0xc000a0dc20, 0x1, 0xc000eb8bd0?})
/home/christian/sandbox/grafana/loki/vendor/github.com/grafana/dskit/modules/modules.go:108 +0xd8
github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc0007c9500, {0x0?, {0x4?, 0x3?, 0x493d3e0?}})
/home/christian/sandbox/grafana/loki/pkg/loki/loki.go:453 +0x9d
main.main()
/home/christian/sandbox/grafana/loki/cmd/loki/main.go:122 +0x113b
@alexandergoncharovaspecta Can you provide your config?
I am able to reproduce the bug on the
release-3.0.x
branch using$ ./cmd/loki/loki -target=backend -index-gateway.mode=ring panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x288 pc=0x22efff0] goroutine 1 [running]: github.com/grafana/loki/v3/pkg/loki.(*Loki).updateConfigForShipperStore(0xc0008b8960?) /home/christian/sandbox/grafana/loki/pkg/loki/modules.go:755 +0xb0 github.com/grafana/loki/v3/pkg/loki.(*Loki).initBloomStore(0xc0007c9500) /home/christian/sandbox/grafana/loki/pkg/loki/modules.go:715 +0x68 github.com/grafana/dskit/modules.(*Manager).initModule(0xc00063c780, {0x7fffab192a32, 0x7}, 0x1?, 0xc000eb8d20?) /home/christian/sandbox/grafana/loki/vendor/github.com/grafana/dskit/modules/modules.go:136 +0x1f7 github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x0?, {0xc000a0dc20, 0x1, 0xc000eb8bd0?}) /home/christian/sandbox/grafana/loki/vendor/github.com/grafana/dskit/modules/modules.go:108 +0xd8 github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc0007c9500, {0x0?, {0x4?, 0x3?, 0x493d3e0?}}) /home/christian/sandbox/grafana/loki/pkg/loki/loki.go:453 +0x9d main.main() /home/christian/sandbox/grafana/loki/cmd/loki/main.go:122 +0x113b
apiVersion: v1 kind: ConfigMap metadata: name: loki namespace: observability labels: helm.sh/chart: loki-6.3.4 app.kubernetes.io/name: loki app.kubernetes.io/instance: loki app.kubernetes.io/version: "3.0.0" app.kubernetes.io/managed-by: Helm data: config.yaml: |
auth_enabled: false
chunk_store_config:
chunk_cache_config:
background:
writeback_buffer: 500000
writeback_goroutines: 1
writeback_size_limit: 500MB
default_validity: 0s
memcached:
batch_size: 4
parallelism: 5
memcached_client:
addresses: dnssrvnoa+_memcached-client._tcp.loki-chunks-cache.observability.svc
consistent_hash: true
max_idle_conns: 72
timeout: 2000ms
common:
compactor_address: 'http://loki-backend:3100'
path_prefix: /var/loki
replication_factor: 3
storage:
azure:
account_key: ${LOKI_AZURE_ACCOUNT_KEY}
account_name: ${LOKI_AZURE_ACCOUNT_NAME}
container_name: chunks
use_federated_token: false
use_managed_identity: false
frontend:
scheduler_address: ""
tail_proxy_url: http://loki-querier.observability.svc.cluster.local:3100
frontend_worker:
scheduler_address: ""
index_gateway:
mode: ring
limits_config:
allow_structured_metadata: false
max_cache_freshness_per_query: 10m
max_query_parallelism: 32
max_query_series: 100000
query_timeout: 300s
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: 720h
split_queries_by_interval: 15m
tsdb_max_query_parallelism: 512
volume_enabled: true
memberlist:
join_members:
- loki-memberlist
pattern_ingester:
enabled: false
querier:
max_concurrent: 16
query_range:
align_queries_with_step: true
cache_results: true
results_cache:
cache:
background:
writeback_buffer: 500000
writeback_goroutines: 1
writeback_size_limit: 500MB
default_validity: 12h
memcached_client:
addresses: dnssrvnoa+_memcached-client._tcp.loki-results-cache.observability.svc
consistent_hash: true
timeout: 500ms
update_interval: 1m
query_scheduler:
max_outstanding_requests_per_tenant: 32768
ruler:
storage:
azure:
account_key: ${LOKI_AZURE_ACCOUNT_KEY}
account_name: ${LOKI_AZURE_ACCOUNT_NAME}
container_name: ruler
use_federated_token: false
use_managed_identity: false
type: azure
runtime_config:
file: /etc/loki/runtime-config/runtime-config.yaml
schema_config:
configs:
- from: "2024-02-29"
index:
period: 24h
prefix: loki_index_
object_store: azure
schema: v13
store: tsdb
server:
grpc_listen_port: 9095
http_listen_port: 3100
http_server_read_timeout: 600s
http_server_write_timeout: 600s
storage_config:
boltdb_shipper:
index_gateway_client:
server_address: dns+loki-backend-headless.observability.svc.cluster.local:9095
hedging:
at: 250ms
max_per_second: 20
up_to: 3
tsdb_shipper:
index_gateway_client:
server_address: dns+loki-backend-headless.observability.svc.cluster.local:9095
tracing:
enabled: false
I am experiencing the same
kubectl logs loki-backend-1 -c loki
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x288 pc=0x22f02b0]
goroutine 1 [running]:
github.com/grafana/loki/v3/pkg/loki.(*Loki).updateConfigForShipperStore(0xc0009e0f00?)
/src/loki/pkg/loki/modules.go:755 +0xb0
github.com/grafana/loki/v3/pkg/loki.(*Loki).initBloomStore(0xc00178c000)
/src/loki/pkg/loki/modules.go:715 +0x68
github.com/grafana/dskit/modules.(*Manager).initModule(0xc000010ea0, {0x7ffde2dd827d, 0x7}, 0x1?, 0xc0017800c0?)
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:136 +0x1f7
github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x0?, {0xc000b8bef0, 0x1, 0xc000b3fa40?})
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108 +0xd8
github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc00178c000, {0x0?, {0x4?, 0x3?, 0x4912940?}})
/src/loki/pkg/loki/loki.go:453 +0x9d
main.main()
/src/loki/cmd/loki/main.go:122 +0x113b
What is the fix for this issue?
I see index_gateway.mode from ring to simple. was the fix but now I am stuck with some other error in gateway pod https://github.com/grafana/loki/issues/12912
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m18s default-scheduler Successfully assigned vector/my-loki-gateway-66f8b59d65-jx7lw to ip-10-0-3-21.eu-west-2.compute.internal
Normal Pulled 7m18s kubelet Container image "docker.io/nginxinc/nginx-unprivileged:1.24-alpine" already present on machine
Normal Created 7m18s kubelet Created container nginx
Normal Started 7m18s kubelet Started container nginx
Warning Unhealthy 2m8s (x33 over 6m58s) kubelet Readiness probe errored: strconv.Atoi: parsing "http": invalid syntax
fixed this making change to helm
https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml#L337-L345
readinessProbe:
httpGet:
path: /
port: http-metrics
initialDelaySeconds: 15
timeoutSeconds: 1``
pod is coming up but loki is not working as expected
A Status: 500. Message: Get "http://loki-gateway.vector.svc.cluster.local/loki/api/v1/query_range?direction=backward&end=1716517388130000000&query=sum+by%28MAC%29+%28count_over_time%28%7BSTATUS%3D%22errObj.error.status%22%7D%5B15s%5D%29%29&start=1716495780000000000&step=15000ms": dial tcp: lookup loki-gateway.vector.svc.cluster.local: no such host
in 5.47.2 this used to work:
readinessProbe:
httpGet:
path: /
port: http
initialDelaySeconds: 15
timeoutSeconds: 1
when used I am getting same error:
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m7s default-scheduler Successfully assigned vector/my-loki-gateway-66f8b59d65-drkgc to ip-10-0-1-223.eu-west-2.compute.internal
Normal Pulled 5m7s kubelet Container image "docker.io/nginxinc/nginx-unprivileged:1.24-alpine" already present on machine
Normal Created 5m7s kubelet Created container nginx
Normal Started 5m7s kubelet Started container nginx
Warning Unhealthy 97s (x22 over 4m47s) kubelet Readiness probe errored: strconv.Atoi: parsing "http": invalid syntax
i had to use the service IP in vector endpoint along with previous fix
readinessProbe:
httpGet:
path: /
port: http-metrics
initialDelaySeconds: 15
timeoutSeconds: 1``
sinks:
loki:
type: "loki"
inputs:
- "lambda_source"
# endpoint: "http://loki-gateway.vector.svc.cluster.local"
endpoint: "http://10.160.197.234"
path: "/loki/api/v1/push"
encoding:
codec: "json"
tenant_id: "lokiprod"
healthcheck:
enabled: true
labels:
now its working as expected
I ran into this crash too upgrading from v2.9.x to v3.0.0. Changing the mode from ring to simple fixed this crash (but still working through other problems).
I'm hitting the same problem
Describe the bug Running version
grafana/loki:main-0bf894b
,loki-backend
(replicas: 1
) crashes:Workaround: edit the configmap, change
index_gateway.mode
fromring
tosimple
. Note that I use tsdb, having a boltdb config or not instorage_config
does not change anything.Environment: