Open 17billion opened 2 years ago
@17billion thanks for this; could you please provide an example of those logs that are duplicated? This will help us to build a test to replicate the issue. Feel free to redact the information in the logs; I'm just interested in the structure.
I have json filtered the below log from fluentd. (Log processing : fluent-bit -> fluentd > loki)
100.1.87.80 - - [26/Oct/2022:10:15:10 +0900] "GET /v1/region/rgn3?lat=37.31131439&lng=126.82345896 HTTP/1.1" 200 140 "https://p....block/" "Mozilla/5.0 (Linux; Android 12; SM-G981N Build/SP1A.210812
.016; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/106.0.5249.126 Mobile Safari/537.36" 0.004 0.004 "127.0.0.1:8080" "111.11.11.111, 100.1.111.9,127.0.0.1, 100.11.111.111" -
fluentd filter
# nginx
<filter tail.ec2.**.nginx>
@type parser
key_name message
reserve_data true
remove_key_name_field true
<parse>
@type multi_format
<pattern>
format json
time_key time_local
time_type string
time_format %d/%b/%Y:%H:%M:%S %z
</pattern>
<pattern>
format regexp
expression /^(?<remote_addr>[^\s]*)\s-\s(?<remote_user>[^\s]*)\s\[(?<time_local>[^\]]*)\]\s"(?<request_method>\S+)(?: +(?<request_path>[^\"]*?)(?: +\S*)?)?"\s(?<status>\d*)\s(?<body_bytes_sent>\d*)\s"(?<http_referer>[^\s]*)"\s"(?<http_user_agent>[^\"]*)"\s(?<request_time>[\d.]+)\s(?<upstream_response_time>[\d.]+)\s"(?<upstream_addr>[^\"]*)"\s"(?<http_x_forwarded_for>[^\"]*)"\s(?<upstream_cache_status>[^\"]*)/
time_format %d/%b/%Y:%H:%M:%S %z
time_key time_local
</pattern>
<pattern>
format none
</pattern>
</parse>
</filter>
grafana query result (Some have been obscured for security.) normal query result
query result using json (appear to be duplicates)
reopen (sorry i clicked wrong)
If I send the same log to a local loki in my local environment, the same thing won't happen. But it's still happening in production environment right now. Lookup with grafana has more logs when using json in the same query. (meaning a lot of duplicate logs)
Thank you for the detail :+1:
https://grafana.com/docs/loki/latest/configuration/
chunks:
prefix:
[row_shards:
Hi @o-TvT-o ,
Thank you for sharing this great news, would you please provide more information?
Which version, change row_shards to what value to solve the issue?
Thanks.
Hi @o-TvT-o ,
Thank you for sharing this great news, would you please provide more information?
Which version, change row_shards to what value to solve the issue?
Thanks.
schema_config:
configs:
- from: 2020-08-01
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: index_
period: 24h
chunks:
prefix: chunks_
period: 24h
row_shards: 1
It seems that when s3 storage is enabled, duplicate logs are stored, associated with the row_shards configuration, and when changed to 1, there are no more duplicate logs.
This configuration is proven in my production environment
i faced the same problem and specifying the row_shards
to 1 as showed by @o-TvT-o solved it.
Just wondering whether it will have any negative impact when using s3 backend.
i've found out that setting query_range.parallelise_shardable_queries
to false
while keeping the row_shards
to default (16) also avoid the duplicate logs. For context, i'm using simple scalable deployment with additional query-frontend
instances
Changing row_shards
to 1 seems to have fixed the problem. @rucciva, is it correct to set parallelise_shardable_queries
to false? I wonder if it looks like that because the previous data has already been processed with row_shards: 1
set.
Also setting row_shards
to 1 seems to affect performance. I wonder if I haven't found a fundamental solution yet.
is it correct to set parallelise_shardable_queries to false
hi @17billion , i'm not sure whether it is correct or not but that is the workaround i'm currently using. But i guess it might have impacts on query performance
I wonder if it looks like that because the previous data has already been processed with row_shards: 1 set.
i don't thinks so, since i have create new cluster with default row_shards
and parallelise_shardable_queries
set to false
, and no duplicate logs are returned form search
Similar issues are occurring even now. (loki ver: 2.8.4) The logs are all the same(The billing number of the log is a value that cannot be duplicated.), but the only difference is the label fluentd_thread flush_thread_x.
The strange thing is that when Loki's load is high and flunetd retries a lot, a lot of duplication occurs. (If Loki is stable and there are no retries from flunetd, duplication rarely occurs.)
Currently, row_shards and parallelise_shardable_queries have not been set separately to solve the fundamental problem. I think there is influence from Fluentd in front of loki. Similar to below issues.
Is there a way to tune the settings of the flunetd agent below? (Our flow : Fluent-bit -> Flunetd -> Loki) https://grafana.com/docs/loki/latest/send-data/fluentd/
same issue + 1
Describe the bug If you use the json parser, the logs appear to be duplicates.
To Reproduce It only happens at certain labels. The difference between the label and the others is that the log volume is large. (About 10k log processing per second.)
Expected behavior Only one log should be shown.
Environment:
Screenshots, Promtail config, or terminal output
loki config
logql : {role="api",service="odr"}
logql : {role="api",service="odr"} | json