kestra-io / kestra

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
https://kestra.io
Apache License 2.0
7.03k stars 410 forks source link

Kestra will go to > 80% CPU and 100% ram if secret in webhook trigger #4095

Open nidomiro opened 1 week ago

nidomiro commented 1 week ago

Describe the issue

I created a webhook-trigger for my workflow. When the key is defined directly as literal, everything works. However If I use a secret, the system will go to >80% CPU and 100% RAM and needs to be hard-reset, when the webhook is triggerd.

I use a docker-compose file where the secrets are in a .env file and this file is referenced in the service kestra via env_file:.

the working trigger:

triggers:
  - id: on-git-commit
    type: io.kestra.plugin.core.trigger.Webhook
    key: mySuperSecretKey

the "freezing" trigger:

triggers:
  - id: on-git-commit
    type: io.kestra.plugin.core.trigger.Webhook
    key: "{{ secret('TRIGGER_DOCKER_SERVER_AUTODEPLOY_WEBHOOK_SECRET') }}"

here are the logs:

kestra-1    | 2024-06-22 18:41:08,628 INFO  default-nioEventLoopGroup-1-9 io.kestra.webserver.access 2024-06-22T18:41:08.614Z | GET /api/v1/executions/search?size=25&page=1&sort=state.startDate:desc&namespace=hl443&flowId=trigger-docker-server-autodeploy HTTP/1.1 | status: 200 | ip: 192.168.224.50 | length: 4479 | duration: 13
kestra-1    | 2024-06-22 18:41:55,291 WARN  io-executor-thread-1 i.k.c.r.p.functions.SecretFunction Unable to get secret consumer
kestra-1    | java.lang.NullPointerException: Cannot invoke "java.util.function.Consumer.accept(Object)" because "addSecretConsumer" is null
kestra-1    |   at io.kestra.core.runners.pebble.functions.SecretFunction.execute(SecretFunction.java:40)
kestra-1    |   at io.pebbletemplates.pebble.node.expression.FunctionOrMacroInvocationExpression.applyFunction(FunctionOrMacroInvocationExpression.java:46)
kestra-1    |   at io.pebbletemplates.pebble.node.expression.FunctionOrMacroInvocationExpression.evaluate(FunctionOrMacroInvocationExpression.java:38)
kestra-1    |   at io.pebbletemplates.pebble.node.PrintNode.render(PrintNode.java:37)
kestra-1    |   at io.pebbletemplates.pebble.node.BodyNode.render(BodyNode.java:44)
kestra-1    |   at io.pebbletemplates.pebble.node.RootNode.render(RootNode.java:31)
kestra-1    |   at io.pebbletemplates.pebble.template.PebbleTemplateImpl.evaluate(PebbleTemplateImpl.java:157)
kestra-1    |   at io.pebbletemplates.pebble.template.PebbleTemplateImpl.evaluate(PebbleTemplateImpl.java:96)
kestra-1    |   at io.kestra.core.runners.VariableRenderer.renderOnce(VariableRenderer.java:100)
kestra-1    |   at io.kestra.core.runners.VariableRenderer.render(VariableRenderer.java:85)
kestra-1    |   at io.kestra.core.runners.VariableRenderer.render(VariableRenderer.java:70)
kestra-1    |   at io.kestra.core.runners.RunContext.render(RunContext.java:614)
kestra-1    |   at io.kestra.webserver.controllers.api.ExecutionController.lambda$webhook$8(ExecutionController.java:471)
kestra-1    |   at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
kestra-1    |   at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
kestra-1    |   at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
kestra-1    |   at java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(Unknown Source)
kestra-1    |   at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(Unknown Source)
kestra-1    |   at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(Unknown Source)
kestra-1    |   at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
kestra-1    |   at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
kestra-1    |   at java.base/java.util.stream.FindOps$FindOp.evaluateSequential(Unknown Source)
kestra-1    |   at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
kestra-1    |   at java.base/java.util.stream.ReferencePipeline.findFirst(Unknown Source)
kestra-1    |   at io.kestra.webserver.controllers.api.ExecutionController.webhook(ExecutionController.java:479)
kestra-1    |   at io.kestra.webserver.controllers.api.ExecutionController.webhook(ExecutionController.java:442)
kestra-1    |   at io.kestra.webserver.controllers.api.ExecutionController.webhookTriggerPost(ExecutionController.java:408)
kestra-1    |   at io.kestra.webserver.controllers.api.$ExecutionController$Definition$Intercepted.$$access$$webhookTriggerPost(Unknown Source)
kestra-1    |   at io.kestra.webserver.controllers.api.$ExecutionController$Definition$Exec.dispatch(Unknown Source)
kestra-1    |   at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:456)
kestra-1    |   at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:129)
kestra-1    |   at io.micronaut.validation.ValidatingInterceptor.validateReturnExecutableValidator(ValidatingInterceptor.java:166)
kestra-1    |   at io.micronaut.validation.ValidatingInterceptor.intercept(ValidatingInterceptor.java:109)
kestra-1    |   at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:138)
kestra-1    |   at io.kestra.webserver.controllers.api.$ExecutionController$Definition$Intercepted.webhookTriggerPost(Unknown Source)
kestra-1    |   at io.kestra.webserver.controllers.api.$ExecutionController$Definition$Exec.dispatch(Unknown Source)
kestra-1    |   at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invokeUnsafe(AbstractExecutableMethodsDefinition.java:461)
kestra-1    |   at io.micronaut.context.DefaultBeanContext$BeanContextUnsafeExecutionHandle.invokeUnsafe(DefaultBeanContext.java:4232)
kestra-1    |   at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:271)
kestra-1    |   at io.micronaut.http.server.RouteExecutor.executeRouteAndConvertBody(RouteExecutor.java:488)
kestra-1    |   at io.micronaut.http.server.RouteExecutor.lambda$callRoute$6(RouteExecutor.java:465)
kestra-1    |   at io.micronaut.core.execution.ExecutionFlow.lambda$async$1(ExecutionFlow.java:87)
kestra-1    |   at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:141)
kestra-1    |   at io.micrometer.core.instrument.Timer.lambda$wrap$0(Timer.java:193)
kestra-1    |   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
kestra-1    |   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
kestra-1    |   at java.base/java.lang.Thread.run(Unknown Source)

Environment

loicmathieu commented 1 week ago

Hi, Can you try with the new 0.17.5 release? It should have a fix for that.

nidomiro commented 1 week ago

Unfortunately I can only confirm that the exception does not appear in the log anymore. The CPU and RAM usage is still the same. I also still need to hard-reset the system (an lxc-container running in Proxmox)

loicmathieu commented 1 week ago

Can you paste here your full flow YAML and the resources allocated to your container?

nidomiro commented 1 week ago

Sure. The value mySuperSecretKey is just a placeholder.

The flow yaml:

id: trigger-docker-server-autodeploy
namespace: hl443
description: Trigger autodeploy for all Docker servers

labels:
  type: autodeploy

variables:
  servers:
    - fqn: docker01.hl443.de
      user: root
    - fqn: docker02.hl443.de
      user: root
    - fqn: docker03.hl443.de
      user: root
    - fqn: nextcloud.hl443.de
      user: root

tasks:
  - id: parallel
    type: io.kestra.plugin.core.flow.EachParallel
    value: "{{ vars.servers }}"
    tasks:

      - id: debugLog
        type: io.kestra.plugin.core.log.Log
        message:
          - "{{ taskrun.value }}"

      - id: trigger-autodeploy
        type: io.kestra.plugin.fs.ssh.Command
        host: "{{ json(taskrun.value)['fqn'] }}"
        username: "{{ json(taskrun.value)['user'] }}"
        authMethod: PUBLIC_KEY
        privateKey: "{{ secret('SSH_ACCESS_KEY_' + json(taskrun.value)['fqn']  | replace({'.': '_'})) }}"
        warningOnStdErr: false
        commands:
          - "source ~/.profile"
          - "cd $HOMELAB_APPS_ROOT"
          - "git pull"
          - "./autodeploy.mts"

triggers:
  - id: on-git-commit
    type: io.kestra.plugin.core.trigger.Webhook
    key: mySuperSecretKey
    #key: "{{ secret('TRIGGER_DOCKER_SERVER_AUTODEPLOY_WEBHOOK_SECRET') }}"

disabled: false

The docker-compose:

version: "3.4"
services:
  postgres:
    image: postgres:16.3
    restart: unless-stopped
    volumes:
      - ${HOMELAB_APPS_ROOT:?}/kestra/data/postgres-data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: kestra
      POSTGRES_USER: kestra
      POSTGRES_PASSWORD: ${KESTRA_POSTGRES_PASSWORD:?}
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -d $${POSTGRES_DB} -U $${POSTGRES_USER}"]
      interval: 30s
      timeout: 10s
      retries: 10
    networks:
      - default
      - proxynet

  kestra:
    image: kestra/kestra:v0.17.5-full
    restart: unless-stopped
    pull_policy: always
    # Note that this is meant for development only. Refer to the documentation for production deployments of Kestra which runs without a root user.
    user: "root"
    command: server standalone --worker-thread=128
    volumes:
      - ${HOMELAB_APPS_ROOT:?}/kestra/data/kestra-data:/app/storage
      - /var/run/docker.sock:/var/run/docker.sock
      - /tmp/kestra-wd:/tmp/kestra-wd
    env_file:
      - ${HOMELAB_APPS_ROOT:?}/kestra/.env
    environment:
      KESTRA_CONFIGURATION: |
        datasources:
          postgres:
            url: jdbc:postgresql://postgres:5432/kestra
            driverClassName: org.postgresql.Driver
            username: kestra
            password: ${KESTRA_POSTGRES_PASSWORD:?}
        kestra:
          server:
            basic-auth:
              enabled: false
              username: "admin@kestra.io" # it must be a valid email address
              password: kestra
          repository:
            type: postgres
          storage:
            type: local
            local:
              base-path: "/app/storage"
          queue:
            type: postgres
          tasks:
            tmp-dir:
              path: /tmp/kestra-wd/tmp
          url: http://kestra.hl443.de/
    labels:
      traefik.enable: true
      # Frontend
      traefik.http.routers.kestra.rule: Host(`kestra.hl443.de`)
      traefik.http.routers.kestra.entrypoints: websecure
      traefik.http.routers.kestra.tls.certresolver: myresolver
      traefik.http.services.kestra.loadbalancer.server.port: 8080
      traefik.http.routers.kestra.service: kestra

      traefik.http.routers.kestra-metrics.rule: Host(`kestra-metrics.hl443.de`)
      traefik.http.routers.kestra-metrics.entrypoints: websecure
      traefik.http.routers.kestra-metrics.tls.certresolver: myresolver
      traefik.http.services.kestra-metrics.loadbalancer.server.port: 8081
      traefik.http.routers.kestra-metrics.service: kestra-metrics
    ports:
      - 127.0.0.1:8080:8080
      - 127.0.0.1:9080:8081
    networks:
      - default
      - proxynet
    depends_on:
      postgres:
        condition: service_started

networks:
  default:
  proxynet:
    external: true

The version of the .env file without values:

  KESTRA_POSTGRES_PASSWORD=
SECRET_SSH_ACCESS_KEY_docker01_hl443_de=
SECRET_SSH_ACCESS_KEY_docker02_hl443_de=
SECRET_SSH_ACCESS_KEY_docker03_hl443_de=
SECRET_TRIGGER_DOCKER_SERVER_AUTODEPLOY_WEBHOOK_SECRET=

The LXC-Container config:

Server: Docker Engine - Community Engine: Version: 26.1.4 API version: 1.45 (minimum version 1.24) Go version: go1.21.11 Git commit: de5c9cf Built: Wed Jun 5 11:29:22 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.33 GitCommit: d2d58213f83a351ca8f528a95fbd145f5654e957 runc: Version: 1.1.12 GitCommit: v1.1.12-0-g51d5e94 docker-init: Version: 0.19.0 GitCommit: de40ad0

nidomiro commented 1 week ago

I think I just found the cause. SECRET_TRIGGER_DOCKER_SERVER_AUTODEPLOY_WEBHOOK_SECRET in the .env file was not base64 encoded but the original secret. I encoded the value and now it works as expected.