Logs showing out of order in the Logs page for remote task runners

japerry911 commented 4 days ago

Describe the issue

"::{"logs": [{"level": "WARN", "message": "2024-10-25T16:09:45.419Z - Failed to upload Tag: <HttpError 400 when requesting https://tagmanager.googleapis.com/tagmanager/v2/accounts/xxx/containers/167765859/workspaces/22/tags/292?alt=json returned \"Invalid identifier: Link Click - Finance Start - Regex URL\". Details: \"[{'message': 'Invalid identifier: Link Click - Finance Start - Regex URL', 'domain': 'global', 'reason': 'badRequest'}]\">"}]}::"

This log does not show up with Kestra Python logging library, it shows 100% errors on TailLogEntries as well for Batch Task Runner.

This is using Python Kestra Library latest version (logger within it).

I also notice in GCP UI that 100% of tail log fetches in the hour this process ran failed

I cannot find any GCP logs on the failed requests though.

Environment

Kestra Version: v0.19.2
Operating System (OS/Docker/Kubernetes): Kestra Cloud
Java Version (if you don't run kestra in Docker): n/a

anna-geller commented 2 days ago

@japerry911 do you happen to have a small reproducer?

japerry911 commented 2 days ago

Hey @anna-geller ,

Sorry I realized that my initial opening of this is super vague/not helpful.

In terms of where I noticed this - I just have a Batch Task Runner running Python in a container, that logged probably 50-100 Warnings over the course of 2 hours, and only one was caught within Kestra.

I can attempt to reproduce this with a smaller/more shareable example and can share that towards the end of day 🚀 .

japerry911 commented 13 hours ago

Hello @anna-geller ,

I am sorry, it's showing them now consistently when I am testing with my example below. Ill keep my eye out and see if it happens again/repetitively. Sorry about raising false alarm on that.

One quick observation before closing this issue - I did notice that logs are written out of order in the Kestra UI. Is this expected due to quickfire nature of this examples logs?

id: python_log_party
namespace: namespace_name
description: |
  This workflow throws a Log Warning party and logs a lot of warnings.

tasks:
  - id: python_log_party
    type: io.kestra.plugin.scripts.python.Script
    beforeCommands:
      - pip install kestra
    script: |
      from time import sleep

      from kestra import Kestra

      iterations = 30
      logger = Kestra.logger()

      logger.info("WARNING PARTY STARTING")

      for i in range(iterations):
        logger.warning(f"Warning - {i + 1}/{iterations}")
        sleep(0.5)

      logger.info("WARNING PARTY COMPLETED")
    taskRunner:
      type: io.kestra.plugin.ee.gcp.runner.Batch
      bucket: "{{ secret('GCS_LANDING_BUCKET') }}"
      projectId: "{{ secret('GCP_PROJECT_ID') }}"
      region: us-central1
      serviceAccount: "{{ secret('GCP_SERVICE_ACCOUNT_JSON') }}"
      waitUntilCompletion: 86400
      machineType: "n2d-highcpu-4"
      computeResource:
        cpu: "4000"
        memory: "4096"

GCP UI:

Kestra UI:

Temporal Kestra UI

--

Let me know if there is anymore details I can provide, thanks team.

*I did only 30 iterations, maybe more would be a better test

anna-geller commented 7 hours ago

Thanks a lot for the careful and detailed analysis. I briefly checked on Docker task runner on both JDBC and Elastic on develop and there everything seems to be fine but on Cloud (using 0.19.2) there's indeed some latency in which UI catched all logs https://share.descript.com/view/OghAv3T3UAy

@MilosPaunovic can you investigate this a bit on Cloud on 0.19.2? This seems to happen regardless of task runner so you could reproduce already using this flow:

id: python_log_party
namespace: namespace_name
description: |
  This workflow throws a Log Warning party and logs a lot of warnings.

tasks:
  - id: python_log_party
    type: io.kestra.plugin.scripts.python.Script
    beforeCommands:
      - pip install kestra
    script: |
      from time import sleep

      from kestra import Kestra

      iterations = 30
      logger = Kestra.logger()

      logger.info("WARNING PARTY STARTING")

      for i in range(iterations):
        logger.warning(f"Warning - {i + 1}/{iterations}")
        sleep(0.5)

      logger.info("WARNING PARTY COMPLETED")

kestra-io / kestra

Logs showing out of order in the Logs page for remote task runners #5662

Describe the issue

Environment