Open slim-bean opened 7 months ago
pls update grafana.com/docs/loki before releasing a major update still shows the 2.9 documentation. :)
I tried upgrading the Helm chart ( 5.47.2 → 6.0.0 ) but encountered these errors:
❯ k -n observability logs loki-write-1
failed parsing config: /etc/loki/config/config.yaml: yaml: unmarshal errors:
line 41: field shared_store not found in type compactor.Config
line 62: field enforce_metric_name not found in type validation.plain. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file
❯ k -n observability logs loki-read-779bd69757-rrdxt
failed parsing config: /etc/loki/config/config.yaml: yaml: unmarshal errors:
line 41: field shared_store not found in type compactor.Config
line 62: field enforce_metric_name not found in type validation.plain. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file
❯ k logs -n observability loki-backend-1
Defaulted container "loki-sc-rules" out of: loki-sc-rules, loki
{"time": "2024-04-08T21:37:34.546399+00:00", "msg": "Starting collector", "level": "INFO"}
{"time": "2024-04-08T21:37:34.546577+00:00", "msg": "No folder annotation was provided, defaulting to k8s-sidecar-target-directory", "level": "WARNING"}
{"time": "2024-04-08T21:37:34.546733+00:00", "msg": "Loading incluster config ...", "level": "INFO"}
{"time": "2024-04-08T21:37:34.547477+00:00", "msg": "Config for cluster api at 'https://10.43.0.1:443' loaded...", "level": "INFO"}
{"time": "2024-04-08T21:37:34.547598+00:00", "msg": "Unique filenames will not be enforced.", "level": "INFO"}
{"time": "2024-04-08T21:37:34.547695+00:00", "msg": "5xx response content will not be enabled.", "level": "INFO"}
Pretty sure I adjusted all the breaking changes described in the release notes but maybe some of the custom config I have is not compatible?
My Helm values are located here, any help?
I tried upgrading the Helm chart ( 5.47.2 → 6.0.0 ) but encountered these errors:
❯ k -n observability logs loki-write-1 failed parsing config: /etc/loki/config/config.yaml: yaml: unmarshal errors: line 41: field shared_store not found in type compactor.Config line 62: field enforce_metric_name not found in type validation.plain. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file
❯ k -n observability logs loki-read-779bd69757-rrdxt failed parsing config: /etc/loki/config/config.yaml: yaml: unmarshal errors: line 41: field shared_store not found in type compactor.Config line 62: field enforce_metric_name not found in type validation.plain. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file
❯ k logs -n observability loki-backend-1 Defaulted container "loki-sc-rules" out of: loki-sc-rules, loki {"time": "2024-04-08T21:37:34.546399+00:00", "msg": "Starting collector", "level": "INFO"} {"time": "2024-04-08T21:37:34.546577+00:00", "msg": "No folder annotation was provided, defaulting to k8s-sidecar-target-directory", "level": "WARNING"} {"time": "2024-04-08T21:37:34.546733+00:00", "msg": "Loading incluster config ...", "level": "INFO"} {"time": "2024-04-08T21:37:34.547477+00:00", "msg": "Config for cluster api at 'https://10.43.0.1:443' loaded...", "level": "INFO"} {"time": "2024-04-08T21:37:34.547598+00:00", "msg": "Unique filenames will not be enforced.", "level": "INFO"} {"time": "2024-04-08T21:37:34.547695+00:00", "msg": "5xx response content will not be enabled.", "level": "INFO"}
Pretty sure I adjusted all the breaking changes described in the release notes but maybe some of the custom config I have is not compatible?
My Helm values are located here, any help?
You are setting shared store in compactor. It also got dropped there.
See https://github.com/grafana/loki/blob/main/docs/sources/configure/_index.md#compactor
delete_request_store is now required
So I should just be able to rename shared_store
to delete_request_store
and be good?
helm template grafana/loki --set loki.useTestSchema=true --set-json imagePullSecrets='["blah"]'
fails for me with ...executing "loki.memcached.statefulSet" at <$.ctx.Values.image.pullSecrets>: nil pointer evaluating interface {}.pullSecrets
Adding --set-json image.pullSecrets='["blah2"]'
to the previous command does work, but image.pullSecrets
isn't documented in values.yaml, and would be kind of redundant, so I think maybe this is a typo for imagePullSecrets
here?
Since the upgrade everything looks good in our environments although the backend pods seem to be outputting a lot of:
level=info ts=2024-04-09T08:01:08.971329289Z caller=gateway.go:241 component=index-gateway msg="chunk filtering is not enabled"
with every loki search. Wasn't happening before 3.0 from what we can tell
I suspect that's because blooms aren't enabled although when I do enable blooms we get a nil pointer:
level=info ts=2024-04-09T08:17:29.692174397Z caller=bloomcompactor.go:458 component=bloom-compactor msg=compacting org_id=plprod table=index_19820 ownership=1f6c0f8500000000-1fa8b221ffffffff
ts=2024-04-09T08:17:31.535678052Z caller=memberlist_logger.go:74 level=warn msg="Got ping for unexpected node 'loki-backend-3-2e51d875' from=10.30.80.69:7946"
level=info ts=2024-04-09T08:17:31.610784021Z caller=scheduler.go:653 msg="this scheduler is in the ReplicationSet, will now accept requests."
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1aec384]
goroutine 1430 [running]:
github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps.OnceFunc.func4.1()
/usr/local/go/src/sync/oncefunc.go:24 +0x7c
panic({0x2002700?, 0x42aae10?})
/usr/local/go/src/runtime/panic.go:914 +0x218
github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps.func2()
/src/loki/pkg/bloomcompactor/controller.go:388 +0x24
github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps.OnceFunc.func4()
/usr/local/go/src/sync/oncefunc.go:27 +0x64
sync.(*Once).doSlow(0x4006e9f128?, 0x0?)
/usr/local/go/src/sync/once.go:74 +0x100
sync.(*Once).Do(0x400004e800?, 0x21cc060?)
/usr/local/go/src/sync/once.go:65 +0x24
github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps.OnceFunc.func5()
/usr/local/go/src/sync/oncefunc.go:31 +0x34
github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).buildGaps(0x4006e7e720, {0x2c70e48, 0x4006e6e7d0}, {0x4006867892, 0x6}, {{0x1f6c0f8500000000?}, {0x40005a0578?, 0x4d6c?}}, {0x4321220?, 0x0?}, ...)
/src/loki/pkg/bloomcompactor/controller.go:396 +0x133c
github.com/grafana/loki/v3/pkg/bloomcompactor.(*SimpleBloomController).compactTenant(0x4006e7e720, {0x2c70e48, 0x4006e6e7d0}, {{0x2?}, {0x40005a0578?, 0x101000000226f98?}}, {0x4006867892, 0x6}, {0x2?, 0x0?}, ...)
/src/loki/pkg/bloomcompactor/controller.go:115 +0x6a0
github.com/grafana/loki/v3/pkg/bloomcompactor.(*Compactor).compactTenantTable(0x40007eee00, {0x2c70e48, 0x4006e6e7d0}, 0x4001a7eab0, 0x0?)
/src/loki/pkg/bloomcompactor/bloomcompactor.go:460 +0x2e8
github.com/grafana/loki/v3/pkg/bloomcompactor.(*Compactor).runWorkers.func2({0x2c70e48, 0x4006e6e7d0}, 0x0?)
/src/loki/pkg/bloomcompactor/bloomcompactor.go:422 +0xe0
github.com/grafana/dskit/concurrency.ForEachJob.func1()
/src/loki/vendor/github.com/grafana/dskit/concurrency/runner.go:105 +0xbc
golang.org/x/sync/errgroup.(*Group).Go.func1()
/src/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:78 +0x58
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1428
/src/loki/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x98
When upgrading, the pod from the new stateful set 'loki-chunks-cache' couldn't be scheduled, because none of our nodes offer the requested 9830 MiB of memory.
pls update grafana.com/docs/loki before releasing a major update still shows the 2.9 documentation. :)
very sorry about this, we are working on a new release processes and also had problems with our documentation updates, I think there are still a few things we are working out but hopefully most of it is correct now.
When upgrading, the pod from the new stateful set 'loki-chunks-cache' couldn't be scheduled, because none of our nodes offer the requested 9830 MiB of memory.
You could disable this external memcached entirely by setting enabled: false
or you can make it smaller by reducing allocatedMemory
this will also automatically adjust the pod requests in k8s!
chunksCache:
# -- Specifies whether memcached based chunks-cache should be enabled
enabled: true
# -- Amount of memory allocated to chunks-cache for object storage (in MB).
allocatedMemory: 8192
Awesome with the new bloom filter, for unique IDs etc! 🎉
I'm looking forward to close issue https://github.com/grafana/loki/issues/91 (from 2018) when the experimental bloom filters are stable. 😄
Regarding docs, some feedback:
Source: https://grafana.com/docs/loki/latest/get-started/deployment-modes/
Trying to update helm chart 5.43.2 to 6.1.0 but i am getting
UPGRADE FAILED: template: loki/templates/single-binary/statefulset.yaml:44:28: executing "loki/templates/single-binary/statefulset.yaml" at <include (print .Template.BasePath "/config.yaml") .>: error calling include: template: loki/templates/config.yaml:19:7: executing "loki/templates/config.yaml" at <include "loki.calculatedConfig" .>: error calling include: template: loki/templates/_helpers.tpl:461:24: executing "loki.calculatedConfig" at <tpl .Values.loki.config .>: error calling tpl: error during tpl function execution for "{{- if .Values.enterprise.enabled}}\n{{- tpl .Values.enterprise.config . }}\n{{- else }}\nauth_enabled: {{ .Values.loki.auth_enabled }}\n{{- end }}\n\n{{- with .Values.loki.server }}\nserver:\n {{- toYaml . | nindent 2}}\n{{- end}}\n\nmemberlist:\n{{- if .Values.loki.memberlistConfig }}\n {{- toYaml .Values.loki.memberlistConfig | nindent 2 }}\n{{- else }}\n{{- if .Values.loki.extraMemberlistConfig}}\n{{- toYaml .Values.loki.extraMemberlistConfig | nindent 2}}\n{{- end }}\n join_members:\n - {{ include \"loki.memberlist\" . }}\n {{- with .Values.migrate.fromDistributed }}\n {{- if .enabled }}\n - {{ .memberlistService }}\n {{- end }}\n
{{- end }}\n{{- end }}\n\n{{- with .Values.loki.ingester }}\ningester:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- if .Values.loki.commonConfig}}\ncommon:\n{{- toYaml .Values.loki.commonConfig | nindent 2}}\n storage:\n {{- include \"loki.commonStorageConfig\" . | nindent 4}}\n{{- end}}\n\n{{- with .Values.loki.limits_config }}\nlimits_config:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\nruntime_config:\n file: /etc/loki/runtime-config/runtime-config.yaml\n\n{{- with .Values.chunksCache }}\n{{- if .enabled }}\nchunk_store_config:\n chunk_cache_config:\n default_validity: {{ .defaultValidity }}\n background:\n writeback_goroutines: {{ .writebackParallelism }}\n writeback_buffer: {{ .writebackBuffer }}\n writeback_size_limit: {{ .writebackSizeLimit }}\n memcached:\n batch_size: {{ .batchSize }}\n parallelism: {{ .parallelism }}\n memcached_client:\n addresses: dnssrvnoa+_memcached-client._tcp.{{ template \"loki.fullname\" $ }}-chunks-cache.{{ $.Release.Namespace }}.svc\n consistent_hash: true\n timeout: {{ .timeout }}\n max_idle_conns: 72\n{{- end }}\n{{- end }}\n\n{{- if .Values.loki.schemaConfig }}\nschema_config:\n{{- toYaml .Values.loki.schemaConfig | nindent 2}}\n{{- end }}\n\n{{- if .Values.loki.useTestSchema }}\nschema_config:\n{{- toYaml .Values.loki.testSchemaConfig | nindent 2}}\n{{- end }}\n\n{{ include \"loki.rulerConfig\" . }}\n\n{{- if or .Values.tableManager.retention_deletes_enabled .Values.tableManager.retention_period }}\ntable_manager:\n retention_deletes_enabled: {{ .Values.tableManager.retention_deletes_enabled }}\n retention_period: {{ .Values.tableManager.retention_period }}\n{{- end }}\n\nquery_range:\n align_queries_with_step: true\n {{- with .Values.loki.query_range }}\n {{- tpl (. | toYaml) $ | nindent 4 }}\n {{- end }}\n {{- if .Values.resultsCache.enabled }}\n {{- with .Values.resultsCache }}\n cache_results: true\n results_cache:\n cache:\n default_validity: {{ .defaultValidity }}\n background:\n writeback_goroutines: {{ .writebackParallelism }}\n writeback_buffer: {{ .writebackBuffer }}\n writeback_size_limit: {{ .writebackSizeLimit }}\n memcached_client:\n consistent_hash: true\n addresses: dnssrvnoa+_memcached-client._tcp.{{ template \"loki.fullname\" $ }}-results-cache.{{ $.Release.Namespace }}.svc\n timeout: {{ .timeout }}\n update_interval: 1m\n {{- end }}\n {{- end }}\n\n{{- with .Values.loki.storage_config }}\nstorage_config:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.query_scheduler }}\nquery_scheduler:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.compactor }}\ncompactor:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.analytics }}\nanalytics:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.querier }}\nquerier:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.index_gateway }}\nindex_gateway:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.frontend }}\nfrontend:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.frontend_worker }}\nfrontend_worker:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.distributor }}\ndistributor:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\ntracing:\n enabled: {{ .Values.loki.tracing.enabled }}\n": template: loki/templates/single-binary/statefulset.yaml:37:6: executing "loki/templates/single-binary/statefulset.yaml" at <include "loki.commonStorageConfig" .>: error calling include: template: loki/templates/_helpers.tpl:228:19: executing "loki.commonStorageConfig" at <$.Values.loki.storage.bucketNames.chunks>: nil pointer evaluating interface {}.chunks
For the loki helm chart: https://github.com/grafana/loki/pull/12067 changed the port name for the gateway service from http
to http-metrics
which caused it to be picked up by the loki ServiceMonitor
.
The gateway responds with a 404 on the /metrics
path causing the prometheus target to fail.
For the loki chart we unfortunately had to face some downtime.
This changed https://github.com/grafana/loki/commit/79b876b65d55c54f4d532e98dc24743dea8bedec#diff-89f4fd98934eb0f277b921d45e4c223e168490c44604e454a2192d28dab1c3e2R4 forced the recreation of all the gateway resources: Deployment
, Service
, PodDisruptionBudget
and most critical Ingress
.
This is problematic for 2 reasons:
Two issues so far with my existing Helm values:
loki.schema_config
apparently became loki.schemaConfig
. After renaming the object, that part was accepted (also by the 5.x helm chart).
Then the loki
ConfigMap failed to be generated. The config.yaml value is literally Error: 'error converting YAML to JSON: yaml: line 70: mapping values are not allowed in this context'
.
Trying to render the helm chart locally with "helm --debug template" results in
Error: template: loki/templates/write/statefulset-write.yaml:46:28: executing "loki/templates/write/statefulset-write.yaml" at <include (print .Template.BasePath "/config.yaml") .>: error calling include: template: loki/templates/config.yaml:19:7: executing "loki/templates/config.ya
ml" at <include "loki.calculatedConfig" .>: error calling include: template: loki/templates/_helpers.tpl:461:24: executing "loki.calculatedConfig" at <tpl .Values.loki.config .>: error calling tpl: error during tpl function execution for "
<<<< template removed for brevity >>>
": template: loki/templates/write/statefulset-write.yaml:37:6: executing "loki/templates/write/statefulset-write.yaml" at <include "loki.commonStorageConfig" .>: error calling include: template: loki/templates/_helpers.tpl:228:19: executing "loki.commonStorageConfig" at <$.Values.loki.storage.bucketNames.chunks>: nil pointer evaluating interface {}.chunks
I try to understand the nested template structure in the helm chart to understand what is happening.
A short helm chart values set (which worked fine with 5.x) triggering the phenomenon:
hahaha
I thought I recognized that github picture!!!
I'm looking forward to close issue https://github.com/grafana/loki/issues/91 (from 2018) when the experimental bloom filters are stable. 😄
2018!!!
Thanks for the great feedback on the docs, very helpful.
One note regarding SSD mode, honestly the original idea of SSD was to make Loki a lot more friendly outside of k8s environments, the problem we found ourselves in though is that we have had no good ability to support customers attempting to run Loki this way and as such we largely require folks to use kubernetes for our commercial offering. This is why the docs are so k8s specific.
It continues to be a struggle to build an open source project which is extremely flexible for folks to run in many ways, but also a product that we have to provide support for.
I'd love to know though how many folks are successfully running SSD mode outside of kubernetes. I'm still a bit bullish on the idea but over time I kind of feel like it hasn't played out as well as we hoped.
For the loki helm chart: #12067 changed the port name for the gateway service from
http
tohttp-metrics
which caused it to be picked up by the lokiServiceMonitor
.The gateway responds with a 404 on the
/metrics
path causing the prometheus target to fail.
oh interesting, we'll take a look at this, not sure what happened here, thanks!
@tete17 I created a new issue for what you found https://github.com/grafana/loki/issues/12554
Thank you for reporting, sorry for the troubles :(
@MartinEmrich thank you, I will update the upgrade guide around schemaConfig, sorry about that. And thank you for the sample test values file! very helpful!
Congratulations on the release! :tada: :) Is there any way to verify that bloom filters are active and working? I cannot seem to find any metrics or log entries that might give a hint. There are also no bloom services listed on the /services
endpoint:
curl -s -k https://localhost:3100/services
ruler => Running
compactor => Running
store => Running
ingester-querier => Running
query-scheduler => Running
ingester => Running
query-frontend => Running
distributor => Running
server => Running
ring => Running
query-frontend-tripperware => Running
analytics => Running
query-scheduler-ring => Running
querier => Running
cache-generation-loader => Running
memberlist-kv => Running
I tried deploying it on a single instance in monolithic mode via Docker by adding the following options:
limits_config:
bloom_gateway_enable_filtering: true
bloom_compactor_enable_compaction: true
bloom_compactor:
enabled: true
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
bloom_gateway:
enabled: true
client:
addresses: dns+localhost.localdomain:9095
Edit: My bad, it seems that the bloom components are not available when using -target=all
. It needs to be set to -target=all,bloom-compactor,bloom-gateway,bloom-store
for a monolithic deployment I guess? See https://grafana.com/docs/loki/latest/get-started/components/#loki-components.
not sure if this is intended but in the _helpers.tpl there is an if check which might be wrong:
{{- if "loki.deployment.isDistributed "}}
similar check is done here which looks like this:
{{- $isDistributed := eq (include "loki.deployment.isDistributed" .) "true" -}}
{{- if $isDistributed -}}
This causes the if check to always be true and thus the frontend.tail_proxy_url to be set in the loki config. But the configured tail_proxy_url does not point to an existing service (I used SSD deplyoment mode). Not sure if this has any impact.
We encountered a bug in the rendering of the Loki config with the helm chart v6.0.0 that may be similar to what @MartinEmrich encountered above. These simple values will cause the rendering to fail:
loki:
query_range:
parallelise_shardable_queries: false
useTestSchema: true
This causes .Values.loki.config
to look like (note the extra indent):
query_range:
align_queries_with_step: true
parallelise_shardable_queries: false
cache_results: true
I believe anything under loki.query_range
is being misindented here.
EDIT: I've added a PR to solve the above but in general we've had trouble upgrading to Helm chart v6 as there are now two fields which are seemingly necessary where before they were not, and they're not listed in the upgrade guide:
schemaConfig
whereas in v5 we could use a suggested default without needing a useTestSchema
flag.In general I would personally prefer that I can always install a Helm chart with no values and get some kind of sensible default, even if only for testing out the chart. Later, when I want to go production-ready, I can tweak those parameters to something more appropriate.
On the upgrade attempt using Simple Scalable mode scheduler_address
is empty in the rendered config, whilst present before upgrade:
frontend:
scheduler_address: ""
tail_proxy_url: http://loki-querier.grafana.svc.gke-main-a.us-east1:3100
frontend_worker:
scheduler_address: ""
It looks like schedulerAddress
is defined only for the Distributed mode, note, service query-scheduler-discovery
is still created
We encountered a bug in the rendering of the Loki config with the helm chart v6.0.0 that may be similar to what @MartinEmrich encountered above. These simple values will cause the rendering to fail:
loki: query_range: parallelise_shardable_queries: false useTestSchema: true
This causes
.Values.loki.config
to look like (note the extra indent):query_range: align_queries_with_step: true parallelise_shardable_queries: false cache_results: true
I believe anything under
loki.query_range
is being misindented here.EDIT: I've added a PR to solve the above but in general we've had trouble upgrading to Helm chart v6 as there are now two fields which are seemingly necessary where before they were not, and they're not listed in the upgrade guide:
* As of 6.0: we must provide a `schemaConfig` whereas in v5 we could use a suggested default without needing a `useTestSchema` flag. * As of 6.1: we must provide storage defaults otherwise templating fails (see [this comment](https://github.com/grafana/loki/pull/12548#issuecomment-2046492619)).
In general I would personally prefer that I can always install a Helm chart with no values and get some kind of sensible default, even if only for testing out the chart. Later, when I want to go production-ready, I can tweak those parameters to something more appropriate.
Very helpful feedback, thank you!
The schemaConfig
name change was an oversight on my part and I need to get it into the upgrade guide, apologies.
The forced requirement for a schemaConfig is an interesting problem, if we default it in the chart then people end up using it which means we can't change it without breaking their clusters because schemas can't be changed, only new ones added. I do supposed we could just add new ones but that feels a bit like forcing an upgrade on someone... I'm not sure, this is a hard problem that I don't have great answers to.
We decided that this time around we'd force people to define a schema, and provide the test schema config value that should be spit out in an error message if you want to just try the chart with data you plan on throwing away. It does seem like we need to update this error or that flag to also provide values for the storage defaults however.
On the upgrade attempt using Simple Scalable mode
scheduler_address
is empty in the rendered config, whilst present before upgrade:frontend: scheduler_address: "" tail_proxy_url: http://loki-querier.grafana.svc.gke-main-a.us-east1:3100 frontend_worker: scheduler_address: ""
It looks like
schedulerAddress
is defined only for the Distributed mode, note, servicequery-scheduler-discovery
is still created
Good eye, and interestingly this is to be expected. In SSD mode loki can resolve the scheduler addresses using the same mechanisms of how we communicate an ownership hash ring via memberlist for other things in Loki. Setting the scheduler address to an empty string enables this behavior by default.
It used to work this way a long time ago but there was unfortunately a bug released in a version of Loki which broke it and a workaround was to set the scheduler_address in helm. the bug was fixed long ago so I returned this to the preferred behavior of letting Loki figure this out itself.
@slim-bean
Good eye, and interestingly this is to be expected. In SSD mode loki can resolve the scheduler addresses using the same mechanisms of how we communicate an ownership hash ring via memberlist for other things in Loki. Setting the scheduler address to an empty string enables this behavior by default.
It used to work this way a long time ago but there was unfortunately a bug released in a version of Loki which broke it and a workaround was to set the scheduler_address in helm. the bug was fixed long ago so I returned this to the preferred behavior of letting Loki figure this out itself.
Thx for the clarification, that makes sense.
So, initially, on upgrade attempt I received the errors from read
/write
like:
level=error ts=2024-04-11T19:54:26.700159188Z caller=ring_watcher.go:56 component=querier component=querier-scheduler-worker msg="error getting addresses from ring" err="empty ring"
After more attempts with empty scheduler_address
following has worked
loki
key in the helm chart, basically set the previous query-scheduler-discovery
for scheduler_address
and apply:
query_scheduler:
use_scheduler_ring: true
frontend:
scheduler_address: query-scheduler-discovery.grafana.svc.some-gke.us-east1.:9095
frontend_worker:
scheduler_address: query-scheduler-discovery.grafana.svc.some-gke.us-east1.:9095
scheduler_address
and apply
query_scheduler:
use_scheduler_ring: true
use_scheduler_ring: true
and applyHowever there is a high chance I've not waited enough for the backend to roll-out and maybe if it would have rolled out error would just go away eventually.
So I wonder if the upgrade path I've used looks correct
@MartinEmrich thank you, I will update the upgrade guide around schemaConfig, sorry about that. And thank you for the sample test values file! very helpful!
@slim-bean Today I noticed that it actually made a difference: Apparently the "schema_config" before has only worked somehow "half": While store:
, object_store:
etc. seems to have worked before (it did use AWS all the time, and never complained), the index.prefix
was only effective after switching to schemaConfig
. In effect my Loki now only sees the logs from three days ago on.
(And being new to Loki, I did not look at the object structure or the file contents to validate that my custom index prefix was effective).
Can I make the older logs still searchable somehow? (Worst case: renaming AWS S3 objects systematically)? A new "schema" block with a "from" date probably won't fit, as it does only allow a date, not a komplete datetime....
One question about 3.0 release: is there an eta on the docker images for it?
I'm way excited about the otel support but I currently run my loki from docker hub and was kind of confused to not see the new version there!
I'm sorry if this isn't the right place to ask
One question about 3.0 release: is there an eta on the docker images for it?
I'm way excited about the otel support but I currently run my loki from docker hub and was kind of confused to not see the new version there!
I'm sorry if this isn't the right place to ask
The latest tag is 3.0 and there is a 3.0.0 tagged image?
One question about 3.0 release: is there an eta on the docker images for it?
I'm way excited about the otel support but I currently run my loki from docker hub and was kind of confused to not see the new version there!
I'm sorry if this isn't the right place to ask
The latest tag is 3.0 and there is a 3.0.0 tagged image?
I'm so sorry, you're right, I got confused due to docker hub ordering and probably the voices in my head.
Is it better to delete my question to avoid further confusion, or just leave it?
We are running Loki (v2.9.2) and Helm Chart (5.36.2) in Prod. We are planning to upgrade Loki so we tested this in our Sandbox environment. This is how we planed the upgrade:
Step-1: Deploy Loki in Sandbox with the same version as we have in Prod (Loki v2.9.2)
Step-2: Set allow_metadata_structure: false
and restart Loki (this will bring new pods)
Step-3: Upgrade Loki to v3.0.0 and make all config changes with new schema as v13 (with future date) and store as tsdb
(we do have tsdb and boltdb with v12 schema) (this will bring new Pods too with latest version)
Step-4: Look for potential errors in Loki Pods
Step-5: Set allow_metadata_structure: true
and restart Loki (this will bring new Pods)
Step-6: Look for potential errors in Loki Pod
Output: Step-1: Executed as expected. Step-2: Executed as expected. Step-3: Got an error mentioned in this GH issue. Set this in the values file:
query_scheduler:
use_scheduler_ring: false
Got an error when trying to see Logs in Grafana.
too many unhealthy instances in the ring
Then set it to use_scheduler_ring: true
to observe the change as suggested here.
Then started getting errors in loki-read
and loki-write
Pods as below but the Pods are running successfully:
loki-read-X
level=error ts=2024-04-15T11:37:02.583859311Z caller=ring_watcher.go:56 component=querier component=querier-scheduler-worker msg="error getting addresses from ring" err="at least 1 healthy replica required, could only find 0 - unhealthy instances: 10.XXX.XXX.XXX:9095,10.XXX.XXX.XXX:9095"
loki-write-X
level=error ts=2024-04-15T11:35:22.705876936Z caller=tcp_transport.go:322 component="memberlist TCPTransport" msg="unknown message type" msgType="\u0016" remote=10.XXX.XXX.XXX:46788
Is there anyone else facing similar issues? We are unsure about this upgrade. Hence holding it off for now until such problems get resolved.
I think there is change in ksonnet behaviour. The Loki-simple-scalable deployment mode has only read and write targets (no backend). The backend components seemed to be included in the read containers.
Loki 2.9.5
curl http://read/services
memberlist-kv => Running
ring => Running
compactor => Running
query-scheduler => Running
server => Running
query-frontend-tripperware => Running
querier => Running
ruler => Running
analytics => Running
ingester-querier => Running
index-gateway => Running
query-frontend => Running
cache-generation-loader => Running
index-gateway-ring => Running
store => Running
query-scheduler-ring => Running
However, I can't find them in read containers anymore after updating to 3.0.0.
Loki 3.0.0
# curl http://read/services
analytics => Running
ring => Running
query-frontend-tripperware => Running
query-frontend => Running
store => Running
querier => Running
server => Running
cache-generation-loader => Running
memberlist-kv => Running
ingester-querier => Running
how should we workaround this issue?
hey guys,
The following config
frontend:
downstream_url: http://xxxxxxx:3100
Causes a nil pointer exception in https://github.com/grafana/loki/blob/main/pkg/lokifrontend/frontend/downstream_roundtripper.go#L37 because the constructor is initialized with an empty codec https://github.com/grafana/loki/blob/main/pkg/lokifrontend/frontend/downstream_roundtripper.go#L29.
Hello,
with the helm chart (version 6.3.3) i have a problem with my s3 storage configuration. I am using seperate buckets with different names, access_keys and secret_keys. I had solved the configuration in version 5.X.X via the loki.structuredConfig.common.storage.s3
& loki.structuredConfig.ruler.storage.s3
. With the latest version, however, I need to configure loki.storage.s3
, where I can configure different names for the buckets, but no different access & secrets keys. In _helpers.tpl I have not found a way to bypass the filling of these properties.
I would be very happy to receive feedback or a workaround. Thank you :)
Hi @PlayMTL
If this is AWS s3, then it is better to use IRSA.
serviceAccount:
annotations:
"eks.amazonaws.com/role-arn": role-arn
Hi @PlayMTL
If this is AWS s3, then it is better to use IRSA.
serviceAccount: annotations: "eks.amazonaws.com/role-arn": role-arn
Hey @tropnikovvl, it's an on-prem deployment with an own s3 solution in the datacenter :/
For the loki helm chart: #12067 changed the port name for the gateway service from
http
tohttp-metrics
which caused it to be picked up by the lokiServiceMonitor
. The gateway responds with a 404 on the/metrics
path causing the prometheus target to fail.oh interesting, we'll take a look at this, not sure what happened here, thanks!
Any update on this one? This is resulting in false positive alerts.
For the loki helm chart: #12067 changed the port name for the gateway service from
http
tohttp-metrics
which caused it to be picked up by the lokiServiceMonitor
. The gateway responds with a 404 on the/metrics
path causing the prometheus target to fail.oh interesting, we'll take a look at this, not sure what happened here, thanks!
Any update on this one? This is resulting in false positive alerts.
Until there is a permanent solution, a workaround is to set the following in the helm values. The service monitor will then ignore the service.
gateway:
service:
labels:
prometheus.io/service-monitor: "false"
I try to follow the Loki Quickstart Guide but I guess there is something wrong?
Among other things I should download the promtail-local-config.yaml
from here but this file does not exist.
Also after starting the stack with docker compose up -d
an error is thrown about mapping a file to a dir or vice versa. A new directory is generated in the working dir named alloy-local-config.yaml
. This name as a file on the other hand exists in the repo. After loading the file in the current dir the stack starts without error.
When navigating to Grafana no data is shown. I guess the reason is the missing promtail container? Flog is happily generating events.
EDIT: With the changes mentioned above events are showing up in grafana
Trying to update helm chart 5.43.2 to 6.1.0 but i am getting
UPGRADE FAILED: template: loki/templates/single-binary/statefulset.yaml:44:28: executing "loki/templates/single-binary/statefulset.yaml" at <include (print .Template.BasePath "/config.yaml") .>: error calling include: template: loki/templates/config.yaml:19:7: executing "loki/templates/config.yaml" at <include "loki.calculatedConfig" .>: error calling include: template: loki/templates/_helpers.tpl:461:24: executing "loki.calculatedConfig" at <tpl .Values.loki.config .>: error calling tpl: error during tpl function execution for "{{- if .Values.enterprise.enabled}}\n{{- tpl .Values.enterprise.config . }}\n{{- else }}\nauth_enabled: {{ .Values.loki.auth_enabled }}\n{{- end }}\n\n{{- with .Values.loki.server }}\nserver:\n {{- toYaml . | nindent 2}}\n{{- end}}\n\nmemberlist:\n{{- if .Values.loki.memberlistConfig }}\n {{- toYaml .Values.loki.memberlistConfig | nindent 2 }}\n{{- else }}\n{{- if .Values.loki.extraMemberlistConfig}}\n{{- toYaml .Values.loki.extraMemberlistConfig | nindent 2}}\n{{- end }}\n join_members:\n - {{ include \"loki.memberlist\" . }}\n {{- with .Values.migrate.fromDistributed }}\n {{- if .enabled }}\n - {{ .memberlistService }}\n {{- end }}\n {{- end }}\n{{- end }}\n\n{{- with .Values.loki.ingester }}\ningester:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- if .Values.loki.commonConfig}}\ncommon:\n{{- toYaml .Values.loki.commonConfig | nindent 2}}\n storage:\n {{- include \"loki.commonStorageConfig\" . | nindent 4}}\n{{- end}}\n\n{{- with .Values.loki.limits_config }}\nlimits_config:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\nruntime_config:\n file: /etc/loki/runtime-config/runtime-config.yaml\n\n{{- with .Values.chunksCache }}\n{{- if .enabled }}\nchunk_store_config:\n chunk_cache_config:\n default_validity: {{ .defaultValidity }}\n background:\n writeback_goroutines: {{ .writebackParallelism }}\n writeback_buffer: {{ .writebackBuffer }}\n writeback_size_limit: {{ .writebackSizeLimit }}\n memcached:\n batch_size: {{ .batchSize }}\n parallelism: {{ .parallelism }}\n memcached_client:\n addresses: dnssrvnoa+_memcached-client._tcp.{{ template \"loki.fullname\" $ }}-chunks-cache.{{ $.Release.Namespace }}.svc\n consistent_hash: true\n timeout: {{ .timeout }}\n max_idle_conns: 72\n{{- end }}\n{{- end }}\n\n{{- if .Values.loki.schemaConfig }}\nschema_config:\n{{- toYaml .Values.loki.schemaConfig | nindent 2}}\n{{- end }}\n\n{{- if .Values.loki.useTestSchema }}\nschema_config:\n{{- toYaml .Values.loki.testSchemaConfig | nindent 2}}\n{{- end }}\n\n{{ include \"loki.rulerConfig\" . }}\n\n{{- if or .Values.tableManager.retention_deletes_enabled .Values.tableManager.retention_period }}\ntable_manager:\n retention_deletes_enabled: {{ .Values.tableManager.retention_deletes_enabled }}\n retention_period: {{ .Values.tableManager.retention_period }}\n{{- end }}\n\nquery_range:\n align_queries_with_step: true\n {{- with .Values.loki.query_range }}\n {{- tpl (. | toYaml) $ | nindent 4 }}\n {{- end }}\n {{- if .Values.resultsCache.enabled }}\n {{- with .Values.resultsCache }}\n cache_results: true\n results_cache:\n cache:\n default_validity: {{ .defaultValidity }}\n background:\n writeback_goroutines: {{ .writebackParallelism }}\n writeback_buffer: {{ .writebackBuffer }}\n writeback_size_limit: {{ .writebackSizeLimit }}\n memcached_client:\n consistent_hash: true\n addresses: dnssrvnoa+_memcached-client._tcp.{{ template \"loki.fullname\" $ }}-results-cache.{{ $.Release.Namespace }}.svc\n timeout: {{ .timeout }}\n update_interval: 1m\n {{- end }}\n {{- end }}\n\n{{- with .Values.loki.storage_config }}\nstorage_config:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.query_scheduler }}\nquery_scheduler:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.compactor }}\ncompactor:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.analytics }}\nanalytics:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.querier }}\nquerier:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.index_gateway }}\nindex_gateway:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.frontend }}\nfrontend:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.frontend_worker }}\nfrontend_worker:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.distributor }}\ndistributor:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\ntracing:\n enabled: {{ .Values.loki.tracing.enabled }}\n": template: loki/templates/single-binary/statefulset.yaml:37:6: executing "loki/templates/single-binary/statefulset.yaml" at <include "loki.commonStorageConfig" .>: error calling include: template: loki/templates/_helpers.tpl:228:19: executing "loki.commonStorageConfig" at <$.Values.loki.storage.bucketNames.chunks>: nil pointer evaluating interface {}.chunks
I have the same problem. Did you get a solution?
Trying to update helm chart 5.43.2 to 6.1.0 but i am getting
UPGRADE FAILED: template: loki/templates/single-binary/statefulset.yaml:44:28: executing "loki/templates/single-binary/statefulset.yaml" at <include (print .Template.BasePath "/config.yaml") .>: error calling include: template: loki/templates/config.yaml:19:7: executing "loki/templates/config.yaml" at <include "loki.calculatedConfig" .>: error calling include: template: loki/templates/_helpers.tpl:461:24: executing "loki.calculatedConfig" at <tpl .Values.loki.config .>: error calling tpl: error during tpl function execution for "{{- if .Values.enterprise.enabled}}\n{{- tpl .Values.enterprise.config . }}\n{{- else }}\nauth_enabled: {{ .Values.loki.auth_enabled }}\n{{- end }}\n\n{{- with .Values.loki.server }}\nserver:\n {{- toYaml . | nindent 2}}\n{{- end}}\n\nmemberlist:\n{{- if .Values.loki.memberlistConfig }}\n {{- toYaml .Values.loki.memberlistConfig | nindent 2 }}\n{{- else }}\n{{- if .Values.loki.extraMemberlistConfig}}\n{{- toYaml .Values.loki.extraMemberlistConfig | nindent 2}}\n{{- end }}\n join_members:\n - {{ include \"loki.memberlist\" . }}\n {{- with .Values.migrate.fromDistributed }}\n {{- if .enabled }}\n - {{ .memberlistService }}\n {{- end }}\n {{- end }}\n{{- end }}\n\n{{- with .Values.loki.ingester }}\ningester:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- if .Values.loki.commonConfig}}\ncommon:\n{{- toYaml .Values.loki.commonConfig | nindent 2}}\n storage:\n {{- include \"loki.commonStorageConfig\" . | nindent 4}}\n{{- end}}\n\n{{- with .Values.loki.limits_config }}\nlimits_config:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\nruntime_config:\n file: /etc/loki/runtime-config/runtime-config.yaml\n\n{{- with .Values.chunksCache }}\n{{- if .enabled }}\nchunk_store_config:\n chunk_cache_config:\n default_validity: {{ .defaultValidity }}\n background:\n writeback_goroutines: {{ .writebackParallelism }}\n writeback_buffer: {{ .writebackBuffer }}\n writeback_size_limit: {{ .writebackSizeLimit }}\n memcached:\n batch_size: {{ .batchSize }}\n parallelism: {{ .parallelism }}\n memcached_client:\n addresses: dnssrvnoa+_memcached-client._tcp.{{ template \"loki.fullname\" $ }}-chunks-cache.{{ $.Release.Namespace }}.svc\n consistent_hash: true\n timeout: {{ .timeout }}\n max_idle_conns: 72\n{{- end }}\n{{- end }}\n\n{{- if .Values.loki.schemaConfig }}\nschema_config:\n{{- toYaml .Values.loki.schemaConfig | nindent 2}}\n{{- end }}\n\n{{- if .Values.loki.useTestSchema }}\nschema_config:\n{{- toYaml .Values.loki.testSchemaConfig | nindent 2}}\n{{- end }}\n\n{{ include \"loki.rulerConfig\" . }}\n\n{{- if or .Values.tableManager.retention_deletes_enabled .Values.tableManager.retention_period }}\ntable_manager:\n retention_deletes_enabled: {{ .Values.tableManager.retention_deletes_enabled }}\n retention_period: {{ .Values.tableManager.retention_period }}\n{{- end }}\n\nquery_range:\n align_queries_with_step: true\n {{- with .Values.loki.query_range }}\n {{- tpl (. | toYaml) $ | nindent 4 }}\n {{- end }}\n {{- if .Values.resultsCache.enabled }}\n {{- with .Values.resultsCache }}\n cache_results: true\n results_cache:\n cache:\n default_validity: {{ .defaultValidity }}\n background:\n writeback_goroutines: {{ .writebackParallelism }}\n writeback_buffer: {{ .writebackBuffer }}\n writeback_size_limit: {{ .writebackSizeLimit }}\n memcached_client:\n consistent_hash: true\n addresses: dnssrvnoa+_memcached-client._tcp.{{ template \"loki.fullname\" $ }}-results-cache.{{ $.Release.Namespace }}.svc\n timeout: {{ .timeout }}\n update_interval: 1m\n {{- end }}\n {{- end }}\n\n{{- with .Values.loki.storage_config }}\nstorage_config:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.query_scheduler }}\nquery_scheduler:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.compactor }}\ncompactor:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.analytics }}\nanalytics:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.querier }}\nquerier:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.index_gateway }}\nindex_gateway:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.frontend }}\nfrontend:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.frontend_worker }}\nfrontend_worker:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\n{{- with .Values.loki.distributor }}\ndistributor:\n {{- tpl (. | toYaml) $ | nindent 4 }}\n{{- end }}\n\ntracing:\n enabled: {{ .Values.loki.tracing.enabled }}\n": template: loki/templates/single-binary/statefulset.yaml:37:6: executing "loki/templates/single-binary/statefulset.yaml" at <include "loki.commonStorageConfig" .>: error calling include: template: loki/templates/_helpers.tpl:228:19: executing "loki.commonStorageConfig" at <$.Values.loki.storage.bucketNames.chunks>: nil pointer evaluating interface {}.chunks
I have the same problem. Did you get a solution?
after upgrade the values.yaml the "loki.storage.bucketNames" missing, you need to config loki.storage.bucketNames section, like this:
loki:
storage:
bucketNames:
chunks: chunks #change this name to the real bucket name in s3
ruler: ruler
admin: admin
solved after changing this configuration.
@icylord 's suggestion (adding the loki.storage.bucketNames tree) made my values (see post further up) "renderable". But my "loki" ConfigMap still has this literal content:
data:
config.yaml: |
Error: 'error converting YAML to JSON: yaml: line 70: mapping values are not allowed
in this context'
Hmm the more I dig in, the more it seems that the Values format of the 6.0.0+ Helm chart is very different and not backwards compatible. Even if I get the helm chart to accept my Values by trial-and-fix, I have a feeling it will just ignore most of it, and use default values.
Was I too naive to expect this? https://grafana.com/docs/loki/latest/setup/upgrade/upgrade-to-6x/ suggests to me that unless I use one of the value fields marked "BREAKING", I should be safe. Or did I miss the correct migration guide and that link is wrong/insufficient?
A couple of issues:
loki.storage.bucketNames
if I am using loki.structuredConfig
loki
, meaning I can't run the helm chart twice within the same namespace without further changesDefault podAntiAffinity
looks not so good as was. Such things can be handled by rendering aka: hard
or soft
or none
and soft
is defaults that has predefined sets of name, instance and component.
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- - labelSelector:
- matchLabels:
- app.kubernetes.io/name: loki
- app.kubernetes.io/instance: loki
- app.kubernetes.io/component: backend
- topologyKey: kubernetes.io/hostname
-
+ - labelSelector:
+ matchLabels:
+ app.kubernetes.io/component: backend
+ topologyKey: kubernetes.io/hostname
Gateway return 404 on /metrics leading to TargetDown
alert, port should be named as http
, not http-metrics
, this was changed with update. To fix this anybody can set:
# Configuration for the gateway
gateway:
nginxConfig:
# -- Allows appending custom configuration to the server block
serverSnippet: |-
# workaround temprorary issue with port http-metrics,
# while nginx should have name just of the port just http
location = /metrics {
return 200;
}
Also:
podAntiAffinity
by default, this not goodmonitoring
part is set to deprecating stage? When it will be removed how to know if Loki working and working optimally? I saw it swapped to another chart, but that chart provide too much details that not needed if you need only Loki. I not sure if this is good decision 😓If i understood this part correctly:
and this part:
I removed the deprecated tsdb.shipper.shared-store
setting and my period_config
inside the schema_config
looks like this after the upgrade:
- from: 2023-07-01
store: tsdb
object_store: azure
schema: v12
index:
prefix: index_
period: 24h
- from: 2024-06-01
store: tsdb
object_store: azure
schema: v13
index:
prefix: index_
period: 24h
I should be able to see all the logs and logs coming in after the 2024-06-01 should use the new schema.
However, I can only access the logs since Loki was upgraded. (which in my case is like an hour ago) Logs from before the upgrade are no longer accessible, even though they remain in the object store.
Probably related to: https://github.com/grafana/loki/issues/12506#issuecomment-2051511233
UPDATE: seems like this was some layer 8 problem, but @MartinEmrich pointed me in the right direction.
he said:
loki.schema_config apparently became loki.schemaConfig
and i noticed: it was called like this the entire time. At least since helm chart version 3.x.
Therefore, due to the wrong key set in our values.yaml
we always used the default:
{{- if .Values.loki.schemaConfig }}
schema_config:
{{- toYaml .Values.loki.schemaConfig | nindent 2}}
{{- else }}
schema_config:
configs:
- from: 2022-01-11
store: boltdb-shipper
object_store: {{ .Values.loki.storage.type }}
schema: v12
index:
prefix: loki_index_
period: 24h
{{- end }}
and that broke now, as we were now using our own schema config for the first time. we didn't notice, as the logs were sent to azure and we didnt check if the format was actually the one we configured. Once i re-added that schema_config config to my new tsdb config, it works now and i can see all the old and new logs.
Maybe this helps someone in the future. 😄
We encountered a bug in the rendering of the Loki config with the helm chart v6.0.0 that may be similar to what @MartinEmrich encountered above. These simple values will cause the rendering to fail:
loki: query_range: parallelise_shardable_queries: false useTestSchema: true
A quick fix for this (until the PR is merged) is to move the configuration to structuredConfig
:
loki:
structuredConfig:
query_range:
parallelise_shardable_queries: true
UPDATE: seems like this was some layer 8 problem, but @MartinEmrich pointed me in the right direction.
he said:
loki.schema_config apparently became loki.schemaConfig
and i noticed: it was called like this the entire time. At least since helm chart version 3.x.
Therefore, due to the wrong key set in our
values.yaml
we always used the default:
At least I am not the only one. 😌
and that broke now, as we were now using our own schema config for the first time. we didn't notice, as the logs were sent to azure and we didnt check if the format was actually the one we configured. Once i re-added that schema_config config to my new tsdb config, it works now and i can see all the old and new logs.
How exactly did you achieve seeing both the old and new logs? Since my configured schema (index prefix,...) was now actually used for the first time, I no longer see the old logs with the default scheme...
How exactly did you achieve seeing both the old and new logs? Since my configured schema (index prefix,...) was now actually used for the first time, I no longer see the old logs with the default scheme...
@MartinEmrich just add the old default schema_config to your existing schema_config:
- from: 2022-01-11
store: boltdb-shipper
object_store: {{ .Values.loki.storage.type }} #replace this
schema: v12
index:
prefix: loki_index_
period: 24h
this needs to be added before your own schema-config.
@MartinEmrich just add the old default schema_config to your existing schema_config:
That would only work if the transition happened at midnight, or would it?
the from:
date only supports days, not exact points in time, so for the date when it switched, I would have a gap:
i.e. if I changed the config on 9th April at 13:37, adding a block like:
schemaConfig:
configs:
- from: 2024-01-19
store: tsdb
object_store: aws
schema: v11
index:
prefix: "loki_index_"
period: 24h
- from: 2024-04-09
store: tsdb
object_store: aws
schema: v11
index:
prefix: "my_prefix_"
period: 24h
would still result in a gap from 00:00 to 13:37. If I changed it to 2024-04-10
, the gap would be from 13:37 to the end of that day.
Or am I missing something?
@MartinEmrich index are 24h, you must put new index to tomorrow or in any future, not today or past time! There will be bo gap, just till date not meet - old index shema will be created/used. Time is in UTC, please check docs before applying, wrong settings can lead to data loss. By wrong settings I mean setting index to current date or to date in the past.
If you encounter any troubles upgrading to Loki 3.0 or have feedback for the upgrade process, please leave a comment on this issue!
Also you can ask questions at: https://slack.grafana.com/ in the channel
#loki-3
Known Issues:
schema_config
was renamed toschemaConfig
and this is not documented