quickwit-oss / helm-charts

Helm charts for Quickwit
https://helm.quickwit.io
MIT License
22 stars 28 forks source link

Adds initContainer option to the helmcharts and allows default QW_CONFIG to be overridden by chart values. #92

Closed hgudladona closed 3 months ago

hgudladona commented 4 months ago

PR for https://github.com/quickwit-oss/helm-charts/issues/91

idrissneumann commented 4 months ago

Hi thanks for your contribution, I made few comments following some tests.

Thanks in advance :)

hgudladona commented 4 months ago

Hello, Thanks for the review, I made the fixes based on your suggestions. However, I am still working on testing the changes. I will keep you posted once I confirm everything is working as expected.

fmassot commented 4 months ago

@hgudladona do you need another look on your PR?

hgudladona commented 4 months ago

I have fixed the syntax issues and tested the rendering. I am working on rolling out the components onto my EKS cluster and ensure functionality. I will let you know after this test. I am pretty close. Would you like me to move this PR to draft state?

fmassot commented 4 months ago

I have fixed the syntax issues and tested the rendering. I am working on rolling out the components onto my EKS cluster and ensure functionality. I will let you know after this test. I am pretty close. Would you like me to move this PR to draft state

NIce. Keep it as is, that's fine. Let us know the outcome of your tests on your EKS cluster!

hgudladona commented 3 months ago

Hello, I have made necessary fixes and tested these templates while integrating to export my spark logs via Vector --> Warpstream --> Quickwit on AWS. Please take a look at the proposed changes and let me know.

The init container is particularly useful as follows. Having this in the jobs template allows a new config file to be created while substituting AWS_AVAILABILITY_ZONE with the availability_zone from instance metadata during startup from the source config below. This allows the indexers to pull logs from warpstream rack locally (AZ local) preventing unnecessary data transfer costs. Reference to Warpstream docs is here

jobs:
  sources:
    command: ["/bin/bash", "-c", "quickwit source describe --index hudi-spark-logs --source hudi-spark-logs-warpstream --endpoint ${QW_CLUSTER_ENDPOINT} || quickwit source create --index hudi-spark-logs --source-config /quickwit/shared/hudi-spark-logs-warpstream.yaml --endpoint ${QW_CLUSTER_ENDPOINT}"]
    initContainers:
      - name: quickwit-rack-aware-kafka-client
        image: alpine:3.20
        command:
          - "sh"
          - "-c"
          - "apk --update add curl && \
             cp /quickwit/hudi-spark-logs-warpstream.yaml /quickwit/shared/ && \
             TOKEN=$(curl -XPUT \"http://169.254.169.254/latest/api/token\" -H \"X-aws-ec2-metadata-token-ttl-seconds: 21600\") && \
             AVAILABILITY_ZONE=$(curl -XGET -H \"X-aws-ec2-metadata-token: $TOKEN\" http://169.254.169.254/latest/meta-data/placement/availability-zone) && \
             sed -i.bak s/AWS_AVAILABILITY_ZONE/$AVAILABILITY_ZONE/g /quickwit/shared/hudi-spark-logs-warpstream.yaml"
        volumes:
          - name: source
            configMap:
              name: quickwit-bootstrap
              items:
                  - key: hudi-spark-logs-warpstream.yaml
                    path: hudi-spark-logs-warpstream.yaml
        volumeMounts:
          - name: source
            mountPath: /quickwit/hudi-spark-logs-warpstream.yaml
            subPath: hudi-spark-logs-warpstream.yaml
          - name: source-shared
            mountPath: /quickwit/shared/

    # A shared volume for the init container to copy the configuration files to
    extraVolumes:
      - name: source-shared
        emptyDir: {}

    # A shared volume mount for the init container to copy the configuration files to
    extraVolumeMounts:
      - name: source-shared
        mountPath: /quickwit/shared/
        readOnly: true

...

  sources:
    - index: hudi-spark-logs
      source:
        version: 0.8
        source_id: hudi-spark-logs-warpstream
        source_type: kafka
        num_pipelines: 5
        params:
          topic: hudi-spark-logs
          client_params:
            client.id: warpstream_az=AWS_AVAILABILITY_ZONE,warpstream_interzone_lb=true
            bootstrap.servers: warpstream-agent.warpstream:9092
idrissneumann commented 3 months ago

Hi LGTM, I'll run some tests with kind ASAP (probably tomorrow) and merge it if it's okay.

Thanks for your work!

idrissneumann commented 3 months ago

Tests with this value file:

---
config:
  default_index_root_uri: s3://qw-indexes
  storage:
    s3:
      endpoint: https://s3.fr-par.scw.cloud
      region: fr-par
      access_key_id: SCWXXXXXXX
      secret_access_key: XXXXXXXXXXX

environment:
  QW_METASTORE_URI: s3://qw-indexes

indexer:
  initContainers: &initContainers
    - name: hello
      image: busybox:1.28
      command: ['echo', 'heyyyyy']

searcher:
  initContainers: *initContainers

janitor:
  initContainers: *initContainers

control_plane:
  initContainers: *initContainers

metastore:
  initContainers: *initContainers

Result ok ✅:

$ helm template . --values values.yaml --values values2.yaml --namespace=quickwit|kubectl -n quickwit apply -f -
$ kubectl -n quickwit logs release-name-quickwit-searcher-0 -c hello
heyyyyy
$ kubectl -n quickwit logs release-name-quickwit-indexer-0 -c hello
heyyyyy
idrissneumann commented 3 months ago

Other test changing the location of the config file :

---
configLocation: /tmp/yo.yml
config:
  default_index_root_uri: s3://qw-indexes
  storage:
    s3:
      endpoint: https://s3.fr-par.scw.cloud
      region: fr-par
      access_key_id: SCWXXXXXX
      secret_access_key: XXXXXXXX

environment:
  QW_METASTORE_URI: s3://qw-indexes

indexer:
  initContainers: &initContainers
    - name: show
      image: busybox:1.28
      command: ['cat', '/tmp/yo.yaml']
    - name: move
      image: busybox:1.28
      command: ['cp', '/tmp/yo.yaml', '/quickwit/node.yaml']

searcher:
  initContainers: *initContainers

janitor:
  initContainers: *initContainers

control_plane:
  initContainers: *initContainers

metastore:
  initContainers: *initContainers

Test ko ❌:

$ kubectl -n quickwit logs release-name-quickwit-searcher-0 -c show
cat: can't open '/tmp/yo.yaml': No such file or directory

The value field configLocation is only overriding the environment variable:

- name: QW_CONFIG
  value: {{ .Values.configLocation }}

But I think it should also change the configmap path mounted into the pods. Maybe It'll be good if we get the opinion of other reviewers on this. On my side, I think changing only the environment variable to tell quickwit "the config file is there" on one side and continue to write it in /quickwit/node.yaml on the other side seems a bit strange to me even if I understand you want to regenerate a new file autonomously from your initContainers.

hgudladona commented 3 months ago

Usually config maps are immutable from the pod itself, a more probable usage would be to mount an emptyDir to the pod's init and main containers where the mutated config is copied to. If the user IS doing the mutation of the config I'd expect they would also update the configLocation.

idrissneumann commented 3 months ago

Usually config maps are immutable from the pod itself, a more probable usage would be to mount an emptyDir to the pod's init and main containers where the mutated config is copied to. If the user IS doing the mutation of the config I'd expect they would also update the configLocation.

Explanation good enough to me, I agree for the configmap immutability. Let's see if in the future other users will have issues with that. For now, we can deal with overriding completely the config from an initContainers with this version.

So LGTM, I think there is some linter issues to fix (see the github action pipelines) and we're good to merge :)

Thanks for your work

fmassot commented 3 months ago

thank you @hgudladona and @idrissneumann for you work!