Closed cebidhem closed 2 years ago
I've also tried to use the loki.structuredConfig
to have a plain Loki config.yaml but the mergeOverride
function in the configMap give me the same end result.
Which makes me think this function shouldn't not even exist. Either we want/manage to use the loki.config
values to build it, either the user should be able to input the full config he wants. Does it make sense ?
Still trying to find a way, but it's kind of frustrating since there is not even a Loki upgrade here, it's really about the Helm chart itself.
It sounds like the solution here is to set the defaults for secretAccessKey
and accessKeyId
to null.
Would you be able to try out this branch: https://github.com/trevorwhitney/helm-charts/tree/null-s3-defaults
Hi @trevorwhitney thanks for replying!
Sorry it took me some time, but we deploy our Helm charts through Flux using exclusively Helm Repositories, so I had to package it on my side.
Anyway, with your branch and my values (pasting them at the bottom), loki is throwing those authentication errors:
loki-write-0
:
level=error ts=2022-07-06T09:20:44.026140228Z caller=flush.go:222 org_id=fake msg="failed to flush user" err="WebIdentityErr: failed to retrieve credentials\ncaused by: SerializationError: failed to unmarshal error message\n\tstatus code: 405, request id: \ncaused by: UnmarshalError: failed to unmarshal error message\n\t00000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 |<?xml version=\"1|\n00000010 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 55 54 |.0\" encoding=\"UT|\n00000020 46 2d 38 22 3f 3e 0a 3c 45 72 72 6f 72 3e 3c 43 |F-8\"?>.<Error><C|\n00000030 6f 64 65 3e 4d 65 74 68 6f 64 4e 6f 74 41 6c 6c |ode>MethodNotAll|\n00000040 6f 77 65 64 3c 2f 43 6f 64 65 3e 3c 4d 65 73 73 |owed</Code><Mess|\n00000050 61 67 65 3e 54 68 65 20 73 70 65 63 69 66 69 65 |age>The specifie|\n00000060 64 20 6d 65 74 68 6f 64 20 69 73 20 6e 6f 74 20 |d method is not |\n00000070 61 6c 6c 6f 77 65 64 20 61 67 61 69 6e 73 74 20 |allowed against |\n00000080 74 68 69 73 20 72 65 73 6f 75 72 63 65 2e 3c 2f |this resource.</|\n00000090 4d 65 73 73 61 67 65 3e 3c 4d 65 74 68 6f 64 3e |Message><Method>|\n000000a0 50 4f 53 54 3c 2f 4d 65 74 68 6f 64 3e 3c 52 65 |POST</Method><Re|\n000000b0 73 6f 75 72 63 65 54 79 70 65 3e 53 45 52 56 49 |sourceType>SERVI|\n000000c0 43 45 3c 2f 52 65 73 6f 75 72 63 65 54 79 70 65 |CE</ResourceType|\n000000d0 3e 3c 52 65 71 75 65 73 74 49 64 3e 48 59 43 43 |><RequestId>HYCC|\n000000e0 30 54 46 59 4b 4a 59 53 39 33 53 50 3c 2f 52 65 |0TFYKJYS93SP</Re|\n000000f0 71 75 65 73 74 49 64 3e 3c 48 6f 73 74 49 64 3e |questId><HostId>|\n00000100 57 63 59 46 4e 78 77 4c 66 64 46 48 38 6c 51 62 |WcYFNxwLfdFH8lQb|\n00000110 42 53 64 7a 73 32 44 76 61 64 66 74 50 52 6b 71 |BSdzs2DvadftPRkq|\n00000120 36 71 51 61 53 68 4f 4f 54 62 52 56 36 78 4f 62 |6qQaShOOTbRV6xOb|\n00000130 66 74 47 54 74 38 4a 39 64 47 59 64 30 43 4e 4b |ftGTt8J9dGYd0CNK|\n00000140 44 6c 42 6f 36 38 58 56 41 4f 6b 3d 3c 2f 48 6f |DlBo68XVAOk=</Ho|\n00000150 73 74 49 64 3e 3c 2f 45 72 72 6f 72 3e |stId></Error>|\n\ncaused by: unknown error response tag, {{ Error} []}"
loki-read-0
:
level=error ts=2022-07-06T09:29:49.960078199Z caller=ruler.go:493 msg="unable to list rules" err="WebIdentityErr: failed to retrieve credentials\ncaused by: SerializationError: failed to unmarshal error message\n\tstatus code: 405, request id: \ncaused by: UnmarshalError: failed to unmarshal error message\n\t00000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 |<?xml version=\"1|\n00000010 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 55 54 |.0\" encoding=\"UT|\n00000020 46 2d 38 22 3f 3e 0a 3c 45 72 72 6f 72 3e 3c 43 |F-8\"?>.<Error><C|\n00000030 6f 64 65 3e 4d 65 74 68 6f 64 4e 6f 74 41 6c 6c |ode>MethodNotAll|\n00000040 6f 77 65 64 3c 2f 43 6f 64 65 3e 3c 4d 65 73 73 |owed</Code><Mess|\n00000050 61 67 65 3e 54 68 65 20 73 70 65 63 69 66 69 65 |age>The specifie|\n00000060 64 20 6d 65 74 68 6f 64 20 69 73 20 6e 6f 74 20 |d method is not |\n00000070 61 6c 6c 6f 77 65 64 20 61 67 61 69 6e 73 74 20 |allowed against |\n00000080 74 68 69 73 20 72 65 73 6f 75 72 63 65 2e 3c 2f |this resource.</|\n00000090 4d 65 73 73 61 67 65 3e 3c 4d 65 74 68 6f 64 3e |Message><Method>|\n000000a0 50 4f 53 54 3c 2f 4d 65 74 68 6f 64 3e 3c 52 65 |POST</Method><Re|\n000000b0 73 6f 75 72 63 65 54 79 70 65 3e 53 45 52 56 49 |sourceType>SERVI|\n000000c0 43 45 3c 2f 52 65 73 6f 75 72 63 65 54 79 70 65 |CE</ResourceType|\n000000d0 3e 3c 52 65 71 75 65 73 74 49 64 3e 32 47 39 37 |><RequestId>2G97|\n000000e0 30 36 41 38 54 53 48 52 35 54 38 52 3c 2f 52 65 |06A8TSHR5T8R</Re|\n000000f0 71 75 65 73 74 49 64 3e 3c 48 6f 73 74 49 64 3e |questId><HostId>|\n00000100 36 56 6a 4f 36 62 32 47 78 79 47 65 4a 6b 46 71 |6VjO6b2GxyGeJkFq|\n00000110 6d 35 63 44 51 45 4f 4b 48 73 72 34 58 54 35 68 |m5cDQEOKHsr4XT5h|\n00000120 58 72 39 51 46 69 71 47 6a 62 66 77 42 6e 33 2b |Xr9QFiqGjbfwBn3+|\n00000130 71 74 78 4d 50 43 6b 71 32 2f 62 54 4f 72 63 56 |qtxMPCkq2/bTOrcV|\n00000140 33 45 69 30 53 74 76 75 37 30 63 3d 3c 2f 48 6f |3Ei0Stvu70c=</Ho|\n00000150 73 74 49 64 3e 3c 2f 45 72 72 6f 72 3e |stId></Error>|\n\ncaused by: unknown error response tag, {{ Error} []}"
values.yaml
:
fullnameOverride: loki
gateway:
ingress:
annotations:
cert-manager.io/cluster-issuer: letsencrypt
kubernetes.io/tls-acme: "true"
enabled: true
hosts:
- host: loki.mydomain.com
paths:
- path: /
pathType: Prefix
ingressClassName: ingress-nginx-global
tls:
- hosts:
- loki.mydomain.com
secretName: loki-gateway-tls
loki:
auth_enabled: false
schemaConfig:
configs:
- from: "2020-10-24"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v11
store: boltdb-shipper
- from: "2022-07-10"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v12
store: boltdb-shipper
storage:
bucketNames:
chunks: my-company-loki-objstore
ruler: my-company-loki-objstore
s3:
endpoint: https://s3.eu-west-1.amazonaws.com
insecure: false
region: eu-west-1
s3: s3://eu-west-1
s3ForcePathStyle: false
storageConfig:
boltdb_shipper:
shared_store: s3
monitoring:
selfMonitoring:
enabled: false
grafanaAgent:
installOperator: false
serviceMonitor:
enabled: true
read:
persistence:
size: 5Gi
storageClass: ebs-sc
replicas: 2
serviceAccount:
annotations:
arn:aws:iam::redacted_aws_account_id:role/loki-irsa-role
write:
persistence:
size: 5Gi
storageClass: ebs-sc
replicas: 2
I'm also adding the ConfigMap created by the chart:
kind: ConfigMap
apiVersion: v1
data:
config.yaml: |
auth_enabled: false
common:
path_prefix: /var/loki
replication_factor: 3
storage:
s3:
access_key_id: null
bucketnames: my-company-loki-objstore
endpoint: https://s3.eu-west-1.amazonaws.com
insecure: false
region: eu-west-1
s3: s3://eu-west-1
s3forcepathstyle: false
secret_access_key: null
limits_config:
enforce_metric_name: false
max_cache_freshness_per_query: 10m
reject_old_samples: true
reject_old_samples_max_age: 168h
split_queries_by_interval: 15m
memberlist:
join_members:
- loki-memberlist
ruler:
storage:
s3:
bucketnames: my-company-loki-objstore
schema_config:
configs:
- from: "2020-10-24"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v11
store: boltdb-shipper
- from: "2022-07-10"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v12
store: boltdb-shipper
server:
grpc_listen_port: 9095
http_listen_port: 3100
@trevorwhitney Actually, it just popped in my mind that since Loki version hasn't changed, I should diff the 1.4.3+ and my production (0.4.0) ConfigMaps. Doing this I've been able to narrow down my issue.
I did a few trials and errors, and your MR definitely helps. Defaulting secretAccessKey
and accessKeyId
to null makes things better. However, I noticed that anytime I set an endpoint
in the config - whether it is regional or not, with or without the https://
- I have the same errors than posted in my previous comment.
As soon as I remove the endpoint
property, everything works as expected. My last test was to have endpoint: null
and with this, it also works as expected.
Working ConfigMap:
kind: ConfigMap
apiVersion: v1
data:
config.yaml: |
auth_enabled: false
common:
path_prefix: /var/loki
replication_factor: 3
storage:
s3:
bucketnames: my-company-loki-objstore
s3: s3://eu-west-1
region: eu-west-1
access_key_id: null
secret_access_key: null
insecure: false
endpoint: null
limits_config:
enforce_metric_name: false
max_cache_freshness_per_query: 10m
reject_old_samples: true
reject_old_samples_max_age: 168h
split_queries_by_interval: 15m
memberlist:
join_members:
- loki-memberlist
ruler:
storage:
s3:
bucketnames: my-company-loki-objstore
schema_config:
configs:
- from: "2020-10-24"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v11
store: boltdb-shipper
- from: "2022-07-10"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v12
store: boltdb-shipper
server:
grpc_listen_port: 9095
http_listen_port: 3100
I would propose to default endpoint
to null, wdyt ? I can also propose a PR for this if you'd prefer.
@cebidhem thanks for testing this out. That sound reasonable to me. I've updated my PR to default endpoint
to null, and also to only include non-null s3 properties in the final config. Mind trying out that branch again with the latest changes?
Hi @trevorwhitney it works perfectly for me!
Here's the ConfigMap generated:
kind: ConfigMap
apiVersion: v1
data:
config.yaml: |
auth_enabled: false
common:
path_prefix: /var/loki
replication_factor: 3
storage:
s3:
bucketnames: my-company-loki-objstore
insecure: false
region: eu-west-1
s3: s3://eu-west-1
s3forcepathstyle: false
limits_config:
enforce_metric_name: false
max_cache_freshness_per_query: 10m
reject_old_samples: true
reject_old_samples_max_age: 168h
split_queries_by_interval: 15m
memberlist:
join_members:
- loki-memberlist
ruler:
storage:
s3:
bucketnames: my-company-loki-objstore
schema_config:
configs:
- from: "2020-10-24"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v11
store: boltdb-shipper
- from: "2022-07-10"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v12
store: boltdb-shipper
server:
grpc_listen_port: 9095
http_listen_port: 3100
Thanks a lot!
Glad to hear it!
@trevorwhitney Do you think we could have your changes published in a soon to come 1.7.1 fix version ?
I created our setup based on @cebidhem recommendation. Also for AWS s3/IRSA. However I see another error.
level=error ts=2022-07-11T11:25:21.299030878Z caller=ruler.go:493 msg="unable to list rules" err="InvalidParameter: 1 validation error(s) found.\n- minimum field size of 1, ListObjectsV2Input.Bucket.\n"
level=error ts=2022-07-11T11:26:21.599707269Z caller=flush.go:222 org_id=fake msg="failed to flush user" err="InvalidParameter: 1 validation error(s) found.\n- minimum field size of 1, PutObjectInput.Bucket.\n"
We are using the latest helm chart 2.13.1
and loki v2.6.0
Our S3 configs looks like this, I think that should be enough. Because AWS-CLI which is also installed along is not complaining about it.
common:
storage:
s3:
s3: s3://{s3.bucket_name}
Hi @Vad1mo ,
Are those values or the rendered ConfigMap ?
Yes, we are using the locki-stack chart, and those values are in the secret and mapped into the container as loki.yaml
Thats the whole file:
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
common:
storage:
s3:
s3: s3://eks-cluster-core-services-stack-lokilogs1d26bb6a-6cfhcnqlbutz
s3forcepathstyle: false
compactor:
shared_store: s3
working_directory: /data/loki/boltdb-shipper-compactor
ingester:
chunk_block_size: 262144
chunk_idle_period: 3m
chunk_retain_period: 1m
lifecycler:
ring:
replication_factor: 1
max_transfer_retries: 0
wal:
dir: /data/loki/wal
limits_config:
enforce_metric_name: false
max_entries_limit_per_query: 5000
reject_old_samples: true
reject_old_samples_max_age: 168h
memberlist:
join_members:
- loki-memberlist
ruler:
storage:
s3:
s3: s3://eks-cluster-core-services-stack-lokilogs1d26bb6a-6cfhcnqlbutz
s3forcepathstyle: false
schema_config:
configs:
- from: "2022-06-06"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v12
store: boltdb-shipper
server:
grpc_listen_port: 9095
http_listen_port: 3100
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: s3
filesystem:
directory: /data/loki/chunks
table_manager:
retention_deletes_enabled: true
retention_period: 90d
works like a charm now:
describing the bucket with are region did the trick. aws-cli worked as the env var AWS_DEFAULT_REGION was set.
s3: s3://region/bucket_name
Hey community,
I've tried to upgrade
loki-simple-scalable
from0.4.0
to1.4.3
, and we are running into a lot of issues, some resolved but some not at all. I've tried to look for similar issues in GitHub, found some but still, even following some of the resolutions proposed, it's still not running properly. I've tried to look also in the documentation - readme and online docs - but we know from the beginning it's not up-to-date.We are using a quite basic setup, s3 as data backend and IRSA for authentication. I tried a few configuration, initially the bucketname was not set properly, then without setting accesskey and secretacesskey (as in 0.4.0), it fails because of the chart values defaults, and setting them to null gives me an unmarshall error.
0.4.0 values:
This works perfectly fine.
1.4.3 values:
Those values gives me the following error stacks:
And if I do not specify
null
forsecretAccessKey
andaccessKeyId
, then it uses the chart values defaults, giving me an obvious 403.Does one managed to deploy 1.4.3 using IRSA and s3 buckets ? If so, could you please help us pasting your configuration ?
I'm keen to submit a PR to enhance the documentation on this either in this repo or the loki documentation once I'll have it working.
Thanks for those reading this.