Open vishal-biyani opened 3 years ago
After the study of blueprints for various use cases above, let's go one level deeper in each component and what they can do. Let's use the diagram below and talk about each of components.
Terminology:
So the first component is about sending data from a Prometheus instance to object storage and there are two possibilities here - one is using sidecar (pull model) and other is using receiver (push model). So the user has to choose one of two options based on needs and the limitations in underlying infrastructure. The pros and cons of both models are discussed in this document We will discuss both of these in detail in their respective sections.
The Thanos Sidecar runs as a sicecar in the Prometheus pod (The recommended Prometheus version is 2.2.1 or above). It does mainly three things:
We should cover the first two and later on add support for reloading the config in Krius development.
One important thing is - the Thanos querier component should be able to reach the endpoint exposed by Thanos sidecar. Which means if the querier is in a different cluster/VM/network - the endpoint of sidecar needs to be exposed somehow outside the source cluster
Receiver runs in a separate pod (In same or different cluster) and the Prometheus remote writes to the receiver pod. The API exposed to querier is same Store API.
From Krius's POV - this means
[ ] Do we need to have as many receive components as Prometheus instances?
Querier simply queries data from multiple sources such as S3 buckets and Sidecar/Receiver and gives you results of query. In first iteration, we simply plumb together the S3 buckets and the sidecar/receiver endpoints! For details check out: https://thanos.io/tip/components/query.md/
Querier has a bunch of strategies that we can support in later versions of Krius, details: https://thanos.io/tip/components/query.md/#query-api-overview
Querier frontend is a layer on top of querier to improve read path by using query splitting and caching. The cache can be in memory or memcache. Krius should support installation of Memcached if that is chosen as an option
cluster1: ## This is name of KubeConfig cluster name - as that is what is used to connect
prometheus:
name: prom1
install: true
name: # if install is false then this is used to point to an existing install
namespace:
remote:
mode: receive/sidecar # Only one mode at a time
receiveReference: # Reference to one of the reciever if the mode is receive
s3config: name # A pointer to one of the s3 configs at end
cluster2:
prometheus:
name: prom2
install: false
remote:
mode: sidecar # Since Sidecar is mentioned - there is no need for receiveReference
s3config: name # A pointer to one of the s3 configs
cluster3:
thanos:
name: thanos-ag1
querier:
name:
targets: # https://thanos.io/tip/components/query.md/#store-filtering
dedup-enbaled: true/false # https://thanos.io/tip/components/query.md/#deduplication-enabled
autoDownSample: # https://thanos.io/tip/components/query.md/#auto-downsampling
partial_response: true/false # https://thanos.io/tip/components/query.md/#partial-response-strategy
querier-fe:
name:
cacheOption: in-memory/memcached # One of them only
memcached-options: ## Only if it is defined as memcached above
key1: value1
compactor: # Need to define parameters based on https://thanos.io/tip/components/compact.md/
name:
grafana:
name:
setup: true/false
name:
namespace:
s3configslist:
- name: abc
type: s3
config:
bucket: ""
endpoint: "s3.<region-name>.amazonaws.com"
access_key: ""
secret_key: ""
bucketweb:
enabled: true
- name xyz
@hr1sh1kesh @kanuahs need your review on this one. @PrasadG193 Can you please help @YachikaRalhan with YAML syntax and validation - I have done first draft but it is very rough and not exact YAML IMHO!
Corrected YAML syntax, validated and added more thanos components to cluster -
cluster1: ## This is name of KubeConfig cluster name - as that is what is used to connect
prometheus:
install: true # if install is false then this is used to point to an existing install name & namespace
name: prom1
namespace: default
mode: reciever # receiveReference is required in reciever mode
receiveReference: http://<thanos-receive-container-ip>:10908 # The URL of the endpoint to send samples to.
objStoreConfig: bucket.yaml # Storage configuration for uploading data
cluster2:
prometheus:
name: prom2 #prometheus URL
install: false
mode: sidecar # Since Sidecar is mentioned - there is no need for receiveReference
objStoreConfig: bucket.yaml # Storage configuration for uploading data
cluster3:
thanos:
name: thanos-ag1
querier:
name: testing
targets: testing # https://thanos.io/tip/components/query.md/#store-filtering
dedup-enbaled: true/false # https://thanos.io/tip/components/query.md/#deduplication-enabled
autoDownSample: testing # https://thanos.io/tip/components/query.md/#auto-downsampling
partial_response: true/false # https://thanos.io/tip/components/query.md/#partial-response-strategy
querier-fe:
name: testing
cacheOption: in-memory/memcached # One of them only
memcached-options: ## Only if it is defined as memcached above
enabled: true
key1: value1
reciever:
name: test
httpPort : <port> # not required
httpNodePort: <port> # not required
remoteWritePort: <port> # not required
remoteWriteNodePort: <port> # not required
compactor: # Need to define parameters based on https://thanos.io/tip/components/compact.md/
name: test
ruler:
alertmanagers:
- http://kube-prometheus-alertmanager.monitoring.svc.cluster.local:9093
config: |-
groups:
- name: "metamonitoring"
rules:
- alert: "PrometheusDown"
expr: absent(up{prometheus="monitoring/kube-prometheus"})
grafana:
name: testing
setup:
enabled: true
name: testing
namespace: default
objStoreConfigslist:
- name: abc
type: s3
config:
bucket: ""
endpoint: "s3.<region-name>.amazonaws.com"
access_key: ""
secret_key: ""
bucketweb:
enabled: true
- name: xyz
@YachikaRalhan , @vishal-biyani . So i was thinking Cluster1 and Cluster2 would typically both want either a receiver
or a sidecar
and it won't be a hybrid like how we have mentioned in the example.
Maybe, with that in view we should move the mode above the cluster
stanza
I am referring to this line specifically which is within the cluster
stanza
mode: reciever # receiveReference is required in reciever mode
What do you think?
@hr1sh1kesh Technically it is possible that one Prom cluster is using sidecar mode and another Prom cluster is using recieve mode no?
True, In theory it is possible. But, then its not really a deployment pattern in my opinion where you have 1 cluster remote writing its metrics vs another cluster just having a sidecar. @vishal-biyani
Fair enough - so in interest of future flexibility and a possible option - I would say let's keep the mode at Prom config level. Also cluster1 here is a K8S cluster BTW. You are right that currently there is no known deployment patten to that effect
On behalf of @YachikaRalhan
Unmarshaling the current config file according to spec designed is becoming quite complicated in golang as the keys are different for each cluster and would need to access the data dynamically (nested map(string)interface{}). So may be we were doing something wrong in yaml spec
So I updated the config file -
---
clusters:
- name: cluster1
type: prometheus
data:
install: true
name: prom1
namespace: default
mode: receiver
receiveReference: http://<thanos-receive-container-ip>:10908
objStoreConfig: bucket.yaml
- name: cluster3
type: thanos
data:
name: thanos-ag1
querier:
name: testing...
We will start with some examples of typical deployment blueprints so we get a sense of details and then we will jump on to define a spec for blueprint.
So in this blueprint, we can install Prometheus, inject side car. The Thanos part involves a querier and pointing it to all Prometheus servers and then linking Grafana to thanos querier endpoint.