kestra-io / plugin-azure

Apache License 2.0
2 stars 5 forks source link

Azure Task Runner takes longer to run than expected (8 to 9 minutes for basic Python script) #118

Closed wrussell1999 closed 1 month ago

wrussell1999 commented 1 month ago

Expected Behavior

Should take at most a few minutes to execute this example:

id: azure_batch_runner
namespace: company.team

variables:
  poolId: "poolId"
  containerName: "containerName"

tasks:
  - id: get_env_info
    type: io.kestra.plugin.scripts.python.Commands
    containerImage: ghcr.io/kestra-io/pydata:latest
    taskRunner:
      type: io.kestra.plugin.azure.runner.Batch
      account: "{{ secret('AZURE_ACCOUNT') }}"
      accessKey: "{{ secret('AZURE_ACCESS_KEY') }}"
      endpoint: "{{ secret('AZURE_ENDPOINT') }}"
      poolId: "{{ vars.poolId }}"
      blobStorage:
        containerName: "{{ vars.containerName }}"
        connectionString: "{{ secret('AZURE_CONNECTION_STRING') }}"
    commands:
      - python {{ workingDir }}/main.py
    namespaceFiles:
      enabled: true
    outputFiles:
      - "environment_info.json"
    inputFiles:
      main.py: |
        import platform
        import socket
        import sys
        import json
        from kestra import Kestra

        print("Hello from Azure Batch and kestra!")

        def print_environment_info():
            print(f"Host's network name: {platform.node()}")
            print(f"Python version: {platform.python_version()}")
            print(f"Platform information (instance type): {platform.platform()}")
            print(f"OS/Arch: {sys.platform}/{platform.machine()}")

            env_info = {
                "host": platform.node(),
                "platform": platform.platform(),
                "OS": sys.platform,
                "python_version": platform.python_version(),
            }
            Kestra.outputs(env_info)

            filename = 'environment_info.json'
            with open(filename, 'w') as json_file:
                json.dump(env_info, json_file, indent=4)

        if __name__ == '__main__':
          print_environment_info()

Actual Behaviour

Took 8 to 9 minutes for the two times I ran it.

Steps To Reproduce

No response

Environment Information

Example flow

No response

loicmathieu commented 1 month ago

Should take at most a few minutes to execute this example

What are the rational behind that? Did you try to run it out of Kestra and it took less time? Does the time reported in Azure console is a lot less than in Kestra?

wrussell1999 commented 1 month ago

The GCP and AWS task runners took half the time at most in comparison for the same simple example. I'll try and get some actual numbers next week.

loicmathieu commented 1 month ago

So to try to understand what's going on we would need to compare it in 2 clouds, and for each have:

The later may be the issue, maybe Azure allocate less resources by default.

loicmathieu commented 1 month ago

This is the time taken by Azure before the Job is executed, as far as I know there is nothing we can do at our side so I'm closing it for now.