Closed hgudladona closed 3 months ago
Hi thanks for your contribution, I made few comments following some tests.
Thanks in advance :)
Hello, Thanks for the review, I made the fixes based on your suggestions. However, I am still working on testing the changes. I will keep you posted once I confirm everything is working as expected.
@hgudladona do you need another look on your PR?
I have fixed the syntax issues and tested the rendering. I am working on rolling out the components onto my EKS cluster and ensure functionality. I will let you know after this test. I am pretty close. Would you like me to move this PR to draft state?
I have fixed the syntax issues and tested the rendering. I am working on rolling out the components onto my EKS cluster and ensure functionality. I will let you know after this test. I am pretty close. Would you like me to move this PR to draft state
NIce. Keep it as is, that's fine. Let us know the outcome of your tests on your EKS cluster!
Hello, I have made necessary fixes and tested these templates while integrating to export my spark logs via Vector --> Warpstream --> Quickwit on AWS. Please take a look at the proposed changes and let me know.
The init container is particularly useful as follows. Having this in the jobs template allows a new config file to be created while substituting AWS_AVAILABILITY_ZONE
with the availability_zone from instance metadata during startup from the source config below. This allows the indexers to pull logs from warpstream rack locally (AZ local) preventing unnecessary data transfer costs. Reference to Warpstream docs is here
jobs:
sources:
command: ["/bin/bash", "-c", "quickwit source describe --index hudi-spark-logs --source hudi-spark-logs-warpstream --endpoint ${QW_CLUSTER_ENDPOINT} || quickwit source create --index hudi-spark-logs --source-config /quickwit/shared/hudi-spark-logs-warpstream.yaml --endpoint ${QW_CLUSTER_ENDPOINT}"]
initContainers:
- name: quickwit-rack-aware-kafka-client
image: alpine:3.20
command:
- "sh"
- "-c"
- "apk --update add curl && \
cp /quickwit/hudi-spark-logs-warpstream.yaml /quickwit/shared/ && \
TOKEN=$(curl -XPUT \"http://169.254.169.254/latest/api/token\" -H \"X-aws-ec2-metadata-token-ttl-seconds: 21600\") && \
AVAILABILITY_ZONE=$(curl -XGET -H \"X-aws-ec2-metadata-token: $TOKEN\" http://169.254.169.254/latest/meta-data/placement/availability-zone) && \
sed -i.bak s/AWS_AVAILABILITY_ZONE/$AVAILABILITY_ZONE/g /quickwit/shared/hudi-spark-logs-warpstream.yaml"
volumes:
- name: source
configMap:
name: quickwit-bootstrap
items:
- key: hudi-spark-logs-warpstream.yaml
path: hudi-spark-logs-warpstream.yaml
volumeMounts:
- name: source
mountPath: /quickwit/hudi-spark-logs-warpstream.yaml
subPath: hudi-spark-logs-warpstream.yaml
- name: source-shared
mountPath: /quickwit/shared/
# A shared volume for the init container to copy the configuration files to
extraVolumes:
- name: source-shared
emptyDir: {}
# A shared volume mount for the init container to copy the configuration files to
extraVolumeMounts:
- name: source-shared
mountPath: /quickwit/shared/
readOnly: true
...
sources:
- index: hudi-spark-logs
source:
version: 0.8
source_id: hudi-spark-logs-warpstream
source_type: kafka
num_pipelines: 5
params:
topic: hudi-spark-logs
client_params:
client.id: warpstream_az=AWS_AVAILABILITY_ZONE,warpstream_interzone_lb=true
bootstrap.servers: warpstream-agent.warpstream:9092
Hi LGTM, I'll run some tests with kind ASAP (probably tomorrow) and merge it if it's okay.
Thanks for your work!
Tests with this value file:
---
config:
default_index_root_uri: s3://qw-indexes
storage:
s3:
endpoint: https://s3.fr-par.scw.cloud
region: fr-par
access_key_id: SCWXXXXXXX
secret_access_key: XXXXXXXXXXX
environment:
QW_METASTORE_URI: s3://qw-indexes
indexer:
initContainers: &initContainers
- name: hello
image: busybox:1.28
command: ['echo', 'heyyyyy']
searcher:
initContainers: *initContainers
janitor:
initContainers: *initContainers
control_plane:
initContainers: *initContainers
metastore:
initContainers: *initContainers
Result ok ✅:
$ helm template . --values values.yaml --values values2.yaml --namespace=quickwit|kubectl -n quickwit apply -f -
$ kubectl -n quickwit logs release-name-quickwit-searcher-0 -c hello
heyyyyy
$ kubectl -n quickwit logs release-name-quickwit-indexer-0 -c hello
heyyyyy
Other test changing the location of the config file :
---
configLocation: /tmp/yo.yml
config:
default_index_root_uri: s3://qw-indexes
storage:
s3:
endpoint: https://s3.fr-par.scw.cloud
region: fr-par
access_key_id: SCWXXXXXX
secret_access_key: XXXXXXXX
environment:
QW_METASTORE_URI: s3://qw-indexes
indexer:
initContainers: &initContainers
- name: show
image: busybox:1.28
command: ['cat', '/tmp/yo.yaml']
- name: move
image: busybox:1.28
command: ['cp', '/tmp/yo.yaml', '/quickwit/node.yaml']
searcher:
initContainers: *initContainers
janitor:
initContainers: *initContainers
control_plane:
initContainers: *initContainers
metastore:
initContainers: *initContainers
Test ko ❌:
$ kubectl -n quickwit logs release-name-quickwit-searcher-0 -c show
cat: can't open '/tmp/yo.yaml': No such file or directory
The value field configLocation
is only overriding the environment variable:
- name: QW_CONFIG
value: {{ .Values.configLocation }}
But I think it should also change the configmap path mounted into the pods. Maybe It'll be good if we get the opinion of other reviewers on this. On my side, I think changing only the environment variable to tell quickwit "the config file is there" on one side and continue to write it in /quickwit/node.yaml
on the other side seems a bit strange to me even if I understand you want to regenerate a new file autonomously from your initContainers.
Usually config maps are immutable from the pod itself, a more probable usage would be to mount an emptyDir to the pod's init and main containers where the mutated config is copied to. If the user IS doing the mutation of the config I'd expect they would also update the configLocation.
Usually config maps are immutable from the pod itself, a more probable usage would be to mount an emptyDir to the pod's init and main containers where the mutated config is copied to. If the user IS doing the mutation of the config I'd expect they would also update the configLocation.
Explanation good enough to me, I agree for the configmap immutability. Let's see if in the future other users will have issues with that. For now, we can deal with overriding completely the config from an initContainers with this version.
So LGTM, I think there is some linter issues to fix (see the github action pipelines) and we're good to merge :)
Thanks for your work
thank you @hgudladona and @idrissneumann for you work!
PR for https://github.com/quickwit-oss/helm-charts/issues/91