After upgrading to mimir 2.14 we found that all of our queries longer than 7 days were only returning 7 days worth of data. We run a multi-tenanted system, where we have different sized tenants with different retention periods, typically 7d, 1m and 1y. Users of the platform need to be able to query across multiple tenants, so our default Mimir datasource uses query federation to do this
We found that querying one of the large tenants in isolation for longer period (e.g. 30 days) would return the full data set, but when querying across all tenants, we would only get 7 days worth of data.
I believe this behaviour is a result of https://github.com/grafana/mimir/pull/8388. Where the query frontend takes the minimum retention period out of all tenants, and takes the minimum of that and the max query lookback. So when we included the smaller tenants with a 7 day retention period, the max lookback for any query would be 7 days, despite also including larger tenants that had more than 7 days of data.
To Reproduce
Steps to reproduce the behavior:
Have multiple tenants with varying retention periods, e.g. 1h and 1d
Query mimir for a duration greater than the smallest retention period (i.e. 1d), and only specify the largest tenant. It should return a full day of data
Run the query again but query both tenants. It should only return 1h of data
Expected behavior
When querying across multiple tenants I would expect the max lookback to be equivalent to the largest retention period to avoid truncating any results
Describe the bug
After upgrading to mimir 2.14 we found that all of our queries longer than 7 days were only returning 7 days worth of data. We run a multi-tenanted system, where we have different sized tenants with different retention periods, typically 7d, 1m and 1y. Users of the platform need to be able to query across multiple tenants, so our default Mimir datasource uses query federation to do this
We found that querying one of the large tenants in isolation for longer period (e.g. 30 days) would return the full data set, but when querying across all tenants, we would only get 7 days worth of data.
I believe this behaviour is a result of https://github.com/grafana/mimir/pull/8388. Where the query frontend takes the minimum retention period out of all tenants, and takes the minimum of that and the max query lookback. So when we included the smaller tenants with a 7 day retention period, the max lookback for any query would be 7 days, despite also including larger tenants that had more than 7 days of data.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
When querying across multiple tenants I would expect the max lookback to be equivalent to the largest retention period to avoid truncating any results
Environment
Mimir config
Main Config
``` compactor: compaction_concurrency: 2 compaction_interval: "30m" data_dir: "/data" deletion_delay: "2h" first_level_compaction_wait_period: "25m" max_closing_blocks_concurrency: 2 max_opening_blocks_concurrency: 4 sharding_ring: heartbeat_period: "1m" heartbeat_timeout: "4m" wait_stability_min_duration: "1m" symbols_flushers_concurrency: 4 frontend: cache_results: true log_queries_longer_than: "5s" max_outstanding_per_tenant: 4096 parallelize_shardable_queries: true query_sharding_target_series_per_shard: 2500 results_cache: backend: "memcached" memcached: max_item_size: 1048576 timeout: "500ms" frontend_worker: grpc_client_config: max_send_msg_size: 419430400 limits: align_queries_with_step: true cardinality_analysis_enabled: true max_cache_freshness: "15m" max_query_parallelism: 400 max_total_query_length: "12000h" native_histograms_ingestion_enabled: true out_of_order_time_window: "15m" query_sharding_max_sharded_queries: 640 query_sharding_total_shards: 32 querier: max_concurrent: 20 timeout: "2m" query_scheduler: max_outstanding_requests_per_tenant: 4096 runtime_config: file: "/var/mimir/runtime.yaml" server: grpc_server_max_recv_msg_size: 524288000 grpc_server_max_send_msg_size: 524288000 log_format: "json" store_gateway: sharding_ring: heartbeat_period: "1m" heartbeat_timeout: "4m" kvstore: prefix: "multi-zone/" tokens_file_path: "/data/tokens" unregister_on_shutdown: false wait_stability_min_duration: "1m" zone_awareness_enabled: true tenant_federation: enabled: true usage_stats: enabled: false installation_mode: "helm" ```Runtime/Tenant Config
Indicative example of our tenant setup ``` overrides: large: compactor_blocks_retention_period: "1y" compactor_split_and_merge_shards: 6 compactor_split_groups: 12 ingestion_rate: 30000000 max_fetched_chunks_per_query: 10000000 max_label_names_per_series: 50 medium: compactor_blocks_retention_period: "4w" ingestion_rate: 100000 max_label_names_per_series: 50 small: compactor_blocks_retention_period: "1w" ingestion_rate: 100000 max_label_names_per_series: 50 ```Additional context
The specific line of code in question is here https://github.com/grafana/mimir/pull/8388/files#diff-92de40a72c3f7eb8744de54750b3f1279255c1e494ca560878a6407a4c46e3e9R133