Open danielserrao opened 2 years ago
This started working after using s3 storage with the following loki-distributed config:
loki:
config: |
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
compactor:
shared_store: s3
distributor:
ring:
kvstore:
store: memberlist
frontend:
compress_responses: true
log_queries_longer_than: 5s
tail_proxy_url: http://loki-distributed-querier:3100
frontend_worker:
frontend_address: loki-distributed-query-frontend:9095
ingester:
chunk_block_size: 262144
chunk_encoding: snappy
chunk_idle_period: 5m
chunk_retain_period: 30s
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
max_chunk_age: 5m
max_transfer_retries: 0
wal:
dir: /var/loki/wal
limits_config:
enforce_metric_name: false
max_cache_freshness_per_query: 10m
reject_old_samples: true
reject_old_samples_max_age: 168h
memberlist:
join_members:
- loki-distributed-memberlist
query_range:
align_queries_with_step: true
cache_results: true
max_retries: 5
results_cache:
cache:
enable_fifocache: true
fifocache:
max_size_items: 1024
validity: 24h
split_queries_by_interval: 15m
ruler:
alertmanager_url: https://alertmanager.xx
external_url: https://alertmanager.xx
ring:
kvstore:
store: memberlist
rule_path: /tmp/loki/scratch
storage:
local:
directory: /etc/loki/rules
type: local
schema_config:
configs:
- from: "2020-05-15"
index:
period: 24h
prefix: index_
object_store: s3
schema: v11
store: boltdb-shipper
server:
http_listen_port: 3100
storage_config:
boltdb_shipper:
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
cache_ttl: 168h
index_gateway_client:
server_address: dns:///loki-distributed-index-gateway:9095
shared_store: s3
aws:
bucketnames: <bucket-name>
s3: s3://<region>
table_manager:
retention_deletes_enabled: false
retention_period: 0s
I'm experiencing the same thing with a very similar config to yours but using azure blob storage. If I query anything over 1h i get this annoying message.
Thanks @danielserrao changing all filesystem references to s3 worked for me
Hi
I am getting same error, may I know config which you mentioned where needs to be updated. I have installed Loki as
Is there any configmap we can update?
Best Regards Ganesh
These are all the places I changed https://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/values.yaml#L144 https://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/values.yaml#L163 https://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/values.yaml#L172
I have the same problem after restarting some components. Anyone have a solution on how to fix it?
error: open /grafana-loki/chunks/ZmFrZS8yMDU1NjdiNzY5ZWVhZmJkOjE4MGU0ZTE3ZDJkOjE4MGU1NGZhYjg2OmRkMDQ4NWQy: no such file or directory
But the file exists and all perminision is OK
$ more /grafana-loki/chunks/ZmFrZS8yMDU1NjdiNzY5ZWVhZmJkOjE4MGU0ZTE3ZDJkOjE4MGU1NGZhYjg2OmRkMDQ4NWQy
rke2-ingress-nginx-controller","filename":"/var/log/pods/kube-system_rk
--More--(1%)
When I use loki-simple-scalabel, and I use nfs of storageClass, When I select the time range is 5 minutes ,It is ok, but when I select 15 minutes or 1hour time range,the error occured!
open /var/loki/chunks/fake/755005aa5e414340/MTgxMTNjOGM5MGI6MTgxMTQzNmE2NTI6M2RkYjQzYmQ=: no such file or directory
when I enter the write pod ,the file is exists!
This error occurs sometimes and sometimes not
This is mentioned in the chart README I think:
NOTE: In its default configuration, the chart uses boltdb-shipper and filesystem as storage. The reason for this is that the chart can be validated and installed in a CI pipeline. However, this setup is not fully functional. Querying will not be possible (or limited to the ingesters' in-memory caches) because that would otherwise require shared storage between ingesters and queriers which the chart does not support and would require a volume that supports ReadWriteMany access mode anyways. The recommendation is to use object storage, such as S3, GCS, MinIO, etc., or one of the other options documented at https://grafana.com/docs/loki/latest/storage/.
Using filesystem storage in the multi pod setup would require multiple pods to access the same volume, so data is only queryable as long as it's cached in memory. I got around this issue by installing the single binary Loki chart
I could get the things working by configuring the volumes:
loki-distributed:
ingester:
extraVolumes:
- name: loki-chunks
hostPath:
path: "/var/loki/chunks"
type: Directory
extraVolumeMounts:
- name: loki-chunks
mountPath: "/var/loki/chunks"
querier:
extraVolumes:
- name: loki-chunks
hostPath:
path: "/var/loki/chunks"
type: Directory
extraVolumeMounts:
- name: loki-chunks
mountPath: "/var/loki/chunks"
and I created this folder with permissions to the pods to write on them. Of course, this settings are for local directories, not for volumes on GCS or S3, for example.
@aberenshtein hi, have you solved this issue? i meet the same issue. I don't use object storage and just use filesystem(lvm-localpv)
yes, but I see that the references I put for the value files are outdated. I guess they were updated in later versions
I'm getting this error when there's high traffic in the cluster. I managed to duplicate by running the benchmark tool - wrk. It seems that when promtail
is unable to send logs to loki
due to high network traffic in my cluster, then querying loki datasource in grafana
results in this error if the query range includes the time period of high traffic.
Any solution for this?
UPDATE: I'm running the following kube-prometheus-stack
components in the cluster:
$ helm -n monitoring list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
loki monitoring 1 2022-09-27 14:48:32.011792243 +1000 AEST deployed loki-distributed-0.58.0 2.6.1
prom monitoring 1 2022-09-27 14:47:26.820679248 +1000 AEST deployed kube-prometheus-stack-40.1.2 0.59.1
promtail monitoring 1 2022-09-27 14:48:23.583706894 +1000 AEST deployed promtail-6.4.0 2.6.1
For me, the problem was solved by removing the default configuration storage_config/filesystem
that the helm template generates after applying my values.yaml
file. I am using the helm chart loki-distributed
v0.63.1.
here is the snippet that removes the extra filesystem
config property
# values.yaml
loki:
annotations: {}
...
storageConfig:
boltdb_shipper:
shared_store: s3
aws:
s3: s3://${cluster_region}
bucketnames: ${bucket_name}
filesystem: null
Notice the latest filesystem: null
. That line removes the reference to directory: /var/loki/chunks
that was confusing the querier
# generated configMap
apiVersion: v1
data:
config.yaml: |
auth_enabled: false
...
storage_config:
aws:
bucketnames: bucket-for-logs
s3: s3://${region}
boltdb_shipper:
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
cache_ttl: 168h
shared_store: s3
- filesystem:
- directory: /var/loki/chunks
I have distributed micro services working in one cluster, but in production facing issues after couple of weeks. I added pvc to grafana and restarted the same and now I am not able to get labels in grafana UI with "failed to call resource "
i still have problem when use distributed to query log
I have Grafana with the Loki datasource pointing to the loki querier-frontend but I get the following error when making queries:
Sometimes it is working and then it gets the same error for some reason that is not clear to me.
On the logs of the querier-frontend pod I can see:
caller=logging.go:72 traceID=5c8361c04594c7a2 orgID=fake msg="GET /loki/api/v1/query_range?direction=BACKWARD&limit=1000&query=%7Bjob%3D%22fbit_k8s%22%7D&start=1647619419284000000&end=1647630219285000000&step=5 (500) 53.767877ms Response: \"open /var/loki/chunks/ZmFrZS9kOGU4OGYwOTg3ZTM0NWUyOjE3ZjllMTk0NmE4OjE3ZjllMTk1NmVkOmMwMWFiYmNm: no such file or directory\\n\" ws: false; Accept: application/json, text/plain, */*; Accept-Encoding: gzip, deflate, br; Accept-Language: en-GB,en;q=0.9,en-US;q=0.8; Sec-Ch-Ua: \" Not A;Brand\";v=\"99\", \"Chromium\";v=\"99\", \"Microsoft Edge\";v=\"99\"; Sec-Ch-Ua-Mobile: ?0; Sec-Ch-Ua-Platform: \"Windows\"; Sec-Fetch-Dest: empty; Sec-Fetch-Mode: cors; Sec-Fetch-Site: same-origin; User-Agent: Grafana/8.3.5; X-Forwarded-For: 127.0.0.1, 127.0.0.1; X-Grafana-Org-Id: 1; "
When doing "helm template", the K8s manifest (which is applied) is the following:
test.txt
I already tried multiple types of configurations, but I always get this annoying error.
Some help would be very appreciated.