kestra-io / kestra

:zap: Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 500+ plugins. Alternative to Zapier, Rundeck, Camunda, Airflow...
https://kestra.io
Apache License 2.0
10.39k stars 854 forks source link

io.kestra.plugin.core.storage.Purge task does not clean executions data #4194

Closed aku closed 3 months ago

aku commented 3 months ago

Describe the issue

I've noticed that Purge task does not clean executions. As a consequence, I'm running out of S3 quota quite easily.

I'm running Kestra 0.17.8 with Minio storage plugin (storage-minio-0.17.1.jar) against Ceph s3 storage.

I've tried to run Purge task with different options - to clear all executions or target specific flow/namespace. Purge task output reports some number of cleaned executions but s3 storage still contains executions despite being missing from the UI

Example of a flow:

id: clean_system
namespace: system

tasks:
  - id: clean_up_everything
    type: io.kestra.plugin.core.storage.Purge
    endDate: "{{ now() }}"
    purgeExecution: true
    purgeLog: true
    purgeMetric: true
    purgeStorage: true

S3 sill has some executions:

s3cmd  -c s3.cfg ls s3://kestra/[REDACTED]/[REDACTED]/executions/
                          DIR  s3://kestra/[REDACTED]/[REDACTED]/executions/2OrCHByaFOXxSz4zS6POxM/
                          DIR  s3://kestra/[REDACTED]/[REDACTED]/executions/2pHzsgYsBQTSpVjPW3zmCf/
                          DIR  s3://kestra/[REDACTED]/[REDACTED]/executions/5lz7XhP13jW0gduC3cLRaE/
                          DIR  s3://kestra/[REDACTED]/[REDACTED]/executions/73QRICrm7l8YCpSGV8Ji9t/
2024-07-01 17:17       DIROBJ  s3://kestra/[REDACTED]/[REDACTED]/executions/

Probably related to https://github.com/kestra-io/kestra/issues/3961

Environment

anna-geller commented 3 months ago

thx for the issue. You're totally right that the Purge task is not optimal

@loicmathieu improved this a lot this week by adding new tasks. If you want to take the improved version for a spin, launch kestra develop image and try this flow:

id: purge
namespace: admin

tasks:
  - id: purge_executions
    type: io.kestra.plugin.core.execution.PurgeExecutions
    endDate: "{{ now() | dateAdd(-1, 'MONTHS') }}"
    purgeLog: false

  - id: purge_logs
    type: io.kestra.plugin.core.log.PurgeLogs
    endDate: "{{ now() | dateAdd(-1, 'MONTHS') }}"

triggers:
  - id: monthly
    type: io.kestra.plugin.core.trigger.Schedule
    cron: "0 9 * * *" 

Purging logs in a dedicated task is more performant and reliable

Those new tasks will be released in 0.18.0 on August 6th, until then you can try and give feedback based on the develop image

aku commented 3 months ago

@anna-geller thanks for the update. I've seen that you have some purge-related commits in one of the recent releases but I did not find a description of what exactly has changed. Could you shed some light on it? There were multiple issues

What has changed in 17.12 version? Do I need to wait for 18 version or some of the issues are already fixed in 17.x branch? Btw, it would be nice to have some page with release notes.

loicmathieu commented 3 months ago

See https://github.com/kestra-io/kestra/releases/tag/v0.17.12

Purge deleted execution was backported to 0.17, the new PurgeLogs task will only be available on 0.18

aku commented 3 months ago

@loicmathieu thanks! looking forward for get my hands on 18 version