kestra-io / kestra

:zap: Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 500+ plugins. Alternative to Zapier, Rundeck, Camunda, Airflow...
https://kestra.io
Apache License 2.0
11.88k stars 1.01k forks source link

uploaded namespace files are empty when using s3 #4761

Closed MarthaScheffler closed 3 weeks ago

MarthaScheffler commented 2 months ago

Describe the issue

When using the io.kestra.plugin.core.namespace.UploadFiles plugin to upload the output of a task as namespace file, the uploaded files are empty. They are both, empty in the UI, as well as on S3 itself.

what else got tested? creating a file in the UI (with content) and saving it works. using different storage (e.g. local Docker or Minio) works. see Slack thread https://kestra-io.slack.com/archives/C03FQKXRK3K/p1724405859008609

reproduction of this issue is non-trivial, because the kestra UI loads the file content apparently from the browser history, so even a hard-refresh of the page doesn't display the current file content.

setup: kestra OSS v0.17 & v0.18 @ kubernetes with external Postgres and external S3 (as described here: https://kestra.io/docs/installation/aws-ec2#step-5-use-aws-s3-for-storage)

Flow: https://kestra.io/plugins/core/tasks/namespace/io.kestra.plugin.core.namespace.uploadfiles#examples (Upload files generated by a previous task)

Environment

anna-geller commented 2 months ago

interesting issue - we run our preview env on K8s GKE with GCS storage (almost the same API as S3) and I couldn't reproduce the issue there:

image

image

we'll investigate more and try to find a fix once we can reproduce.

flow for folks who try to reproduce:

id: dbt_duckdb_repro
namespace: company.team

tasks:
  - id: dbt
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: clone_repository
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/dbt-example
        branch: main

      - id: dbt_build
        type: io.kestra.plugin.dbt.cli.DbtCLI
        taskRunner:
          type: io.kestra.plugin.scripts.runner.docker.Docker
        containerImage: ghcr.io/kestra-io/dbt-duckdb:latest
        commands:
          - dbt deps
          - dbt build
        profiles: |
          my_dbt_project:
            outputs:
              dev:
                type: duckdb
                path: ":memory:"
                fixed_retries: 1
                threads: 16
                timeout_seconds: 300
            target: dev

      - id: upload
        type: io.kestra.plugin.core.namespace.UploadFiles
        filesMap: "{{ outputs.dbt_build.outputFiles }}"
        namespace: "{{ flow.namespace }}"
loicmathieu commented 2 months ago

Hi, Wich version exactly are you using? We fixed an S3 storage issue in 0.18.3 so if you didn't use this version can you try it?

MarthaScheffler commented 2 months ago

Hi, Wich version exactly are you using? We fixed an S3 storage issue in 0.18.3 so if you didn't use this version can you try it?

I tried with 0.18.2., 0.18.3. and then downgraded to 0.17 (not sure which subversion) - same empty files.

brian-mulier-p commented 2 months ago

Hello ! I just tried on 0.18.4 with S3 storage (but 0.18.3 should be the same as there was no change on storage) and everything work. Did you change the namespace property in UploadFiles task ? I forgot to do so so I thought I reproduced but moving to the proper namespace (or {{flow.namespace}}) made it work. Screencast from 2024-08-30 09-49-21.webm

Ben8t commented 2 months ago

on my hand to reproduce 👍 will update here

Ben8t commented 3 weeks ago

I'm able to reproduce on 0.19.1 with S3 internal storage:

Everything seems to work but in the end the NamespaceFiles are empty (0 bytes on S3, so logically empty in Kestra)

image image image

id: dbt_duckdb_repro
namespace: company.team

tasks:
  - id: dbt
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: clone_repository
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/dbt-example
        branch: main

      - id: dbt_build
        type: io.kestra.plugin.dbt.cli.DbtCLI
        taskRunner:
          type: io.kestra.plugin.scripts.runner.docker.Docker
        containerImage: ghcr.io/kestra-io/dbt-duckdb:latest
        commands:
          - dbt deps
          - dbt build
        profiles: |
          my_dbt_project:
            outputs:
              dev:
                type: duckdb
                path: ":memory:"
                fixed_retries: 1
                threads: 16
                timeout_seconds: 300
            target: dev

      - id: upload
        type: io.kestra.plugin.core.namespace.UploadFiles
        filesMap: 
          manifest.json: '{{ outputs.dbt_build["outputFiles"]["manifest.json"]}}'
          run_result.json: '{{ outputs.dbt_build["outputFiles"]["run_results.json"] }}'
        namespace: "{{ flow.namespace }}"

@brian-mulier-p here is an even simpler reproducer (so it's not about dbt or workingDir):

id: dbt_duckdb_repro
namespace: company.team

tasks:
    - id: dbt_build
      type: io.kestra.plugin.scripts.shell.Commands
      commands:
        - echo "Test" > test.txt
      outputFiles:
        - test.txt

    - id: upload
      type: io.kestra.plugin.core.namespace.UploadFiles
      filesMap: 
        test.txt: '{{ outputs.dbt_build["outputFiles"]["test.txt"]}}'
      namespace: "{{ flow.namespace }}"
brian-mulier-p commented 3 weeks ago

I know it's been a long time but I've finally found the fix @MarthaScheffler :partying_face: Will be part of Kestra v0.19.3 bugfix release releasing this Tuesday (or optionally here if you can't wait and can add a plugin to your instance manually :P)

FYI I struggled to reproduce it because that was an edge case where putting data from another existing storage file would lead to empty file and as all my tests were done with statically filled inputs I wasn't reproducing the issue. Luckily @Ben8t went into this case :1st_place_medal:

MarthaScheffler commented 2 weeks ago

Thank you! will try this out soon!