kestra-io / plugin-scripts

https://kestra.io/plugins/
Apache License 2.0
9 stars 9 forks source link

`containerImage` doesn't work with `python.Commands` on 0.18 #151

Closed wrussell1999 closed 4 months ago

wrussell1999 commented 5 months ago

Expected Behavior

When executing a flow using io.kestra.plugin.scripts.python.Commands, and containerImage set to ghcr.io/kestra-io/pydata:latest, pandas should be included.

Actual Behaviour

When executing the flow, it errors as it can't find pandas. This suggests that containerImage is not working as pydata includes pandas. Screenshot 2024-06-07 at 17 53 31

This works if you explicitly set the task runner to docker despite being set to docker by default.

id: gitPython
namespace: blueprint

tasks:
  - id: pythonScripts
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
    - id: cloneRepository
      type: io.kestra.plugin.git.Clone
      url: https://github.com/kestra-io/scripts
      branch: main

    - id: python
      type: io.kestra.plugin.scripts.python.Commands
      warningOnStdErr: false
      taskRunner:
        type: io.kestra.plugin.scripts.runner.docker.Docker
      containerImage: ghcr.io/kestra-io/pydata:latest
      commands:
        - python etl/global_power_plant.py

If taskRunner needs to be set, an error should be specified / documentation updated to clear this up.

Steps To Reproduce

No response

Environment Information

Example flow

This works:

id: gitPython
namespace: blueprint

tasks:
  - id: pythonScripts
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
    - id: cloneRepository
      type: io.kestra.plugin.git.Clone
      url: https://github.com/kestra-io/scripts
      branch: main

    - id: python
      type: io.kestra.plugin.scripts.python.Commands
      warningOnStdErr: false
      docker: 
        image: ghcr.io/kestra-io/pydata:latest
      commands:
        - python etl/global_power_plant.py

This also works:

id: gitPython
namespace: blueprint

tasks:
  - id: pythonScripts
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
    - id: cloneRepository
      type: io.kestra.plugin.git.Clone
      url: https://github.com/kestra-io/scripts
      branch: main

    - id: python
      type: io.kestra.plugin.scripts.python.Commands
      warningOnStdErr: false
      taskRunner:
        type: io.kestra.plugin.scripts.runner.docker.Docker
      containerImage: ghcr.io/kestra-io/pydata:latest
      commands:
        - python etl/global_power_plant.py

This causes it to error becauses pandas can't be found (containerImage used instead):

id: gitPython
namespace: blueprint

tasks:
  - id: pythonScripts
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
    - id: cloneRepository
      type: io.kestra.plugin.git.Clone
      url: https://github.com/kestra-io/scripts
      branch: main

    - id: python
      type: io.kestra.plugin.scripts.python.Commands
      warningOnStdErr: false
      containerImage: ghcr.io/kestra-io/pydata:latest
      commands:
        - python etl/global_power_plant.py
loicmathieu commented 5 months ago

containerImage is only used by task runner, this is documented. If you want to change the container image without using task runner you should use the dockerOption.image parameter.

wrussell1999 commented 5 months ago

Thanks for clarifying. I think it's a bit confusing to a new user as there's multiple options to do the same sort of thing.

By default, runner is set to DOCKER. It can also be set to PROCESS. If you want to change the image, despite being set to Docker, you need to use the docker property. If you specify it as a taskRunner (which can also be io.kestra.plugin.scripts.runner.docker.Docker or io.kestra.plugin.core.runner.Process) rather than a runner, then you use containerImage. As far as I can tell, these are achieving the same thing (other than taskRunner has more than 2 runners)?

Seems like two ways to do the same thing, but maybe I'm misunderstanding how it works behind the scenes. Is the plan to deprecate runner once taskRunner comes out of beta? If there is a reason to use one over the other, I will clarify in the docs.

loicmathieu commented 5 months ago

Yes, there are two ways to do the same thing:

As multiple task runner will be container based, we decided to move the image configuration out of the task runner configuration so it's easy to switch for ex from the Docker task runner to the Kubernetes one.

Maybe the documentation can be improved.

loicmathieu commented 4 months ago

Task Runner is now used by default and the old runner property is now deprecated and null by default. This was done in https://github.com/kestra-io/plugin-scripts/pull/153 Closing this one.