kestra-io / kestra

:zap: Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 500+ plugins. Alternative to Zapier, Rundeck, Camunda, Airflow...
https://kestra.io
Apache License 2.0
10.45k stars 866 forks source link

Worker can exhaust system storage #3982

Open aku opened 4 months ago

aku commented 4 months ago

Issue description

I've experimented with Kafka trigger and noticed that when you omit maxRecord setting Kestra's worker can easily consume all available disk space (some temporary files?) which leads to "no space left on device" error and makes worker node useless.

Can you add some global setting that will limit resources consumption by Kestra components? Something similar to how Jounrald works where I can specify how much system resources can be used, how much space should be left on disk. etc.

I guess it is much better to crash some executions or worker process rather than disrupt the whole node (Kestra is running on several VMs in my case)

loicmathieu commented 4 months ago

Temporary files should be deleted after use, but if you have a trigger that creates a file that, in a single trigger evaluation, would read data that cannot be stored in the local filesystem this can happen.

Can you tell us more on your use case, the version of Kestra, its configuration and the flow that causes the issue?

aku commented 4 months ago

In my case I have a simple workflow that reads data from a kafka (topic produces thousands of records per second), transforms them via Jython FileTransform and sends to another kafka instance. The problem happens even if I just want to log a messageCount property of the trigger. If I do not use maxRecords property, they system gets overloaded easily.

I'm using Kestra 0.17.1 running on several VMs (webapp + scheduler running on 1 VM + 3 VMs with executor+worker processes)

triggers:
  - id: read_orders_from_kafka
    type: io.kestra.plugin.kafka.Trigger
    keyDeserializer: STRING
    valueDeserializer: JSON
    interval: PT5S
    topic: [REDACTED]
    groupId: [REDACTED]
    maxRecords: 5000
    properties:      
      [REDACTED]