microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
355 stars 27 forks source link

Container App Jobs do not log to Console when triggered by queue, but do when run manually #1206

Open robrennie opened 1 week ago

robrennie commented 1 week ago

Please provide us with the following information:

This issue is a: (mark with an x)

Issue description

When I run my Container App Job with a single console output, that output appears in the console log analytics when I run the job manually. When the same job runs triggered by a KEDA scaling rule against an Azure Queue, no console logs appear.

Steps to reproduce

  1. Create a simple Container App Job that prints to the console.
  2. Run manually, observe console output in analytics logs.
  3. Setup a simple KEDA scalar as shown in Microsoft's guide to creating an Azure Container App Job to scale the same container app.
  4. Add message to queue, see that system logs are generated for the container app, but there are never console logs.

Expected behavior [What you expected to happen.] Console logs should appear regardless of how the Container App Job is started.

Actual behavior [What actually happened.] No console logs are created.

Additional context

  1. The last Console logs from our jobs were on 6/12/2024.
  2. Deleting and recreating the Log Analytics Workspace associated with the Container App Environment results in the ContainerAppConsoleLogs_CL table not even being created now.
oligarchy commented 1 week ago

we're seeing something similar. seems to have started around tuesday. manual runs go correctly and produce logs, anything that is triggered from the queue reports as a failure and produces no logs for console.writelines or our custom logging client.

the queue activated runs appears to stop dropping system logs somewhere around the time it should be trying to pull the container. the delivery count is not incremented on our queue message on the failed runs, so it's unclear at what point this failing, but it seems to be pretty early on.

we tried recreating the app from the ground up, remade the queue, stripped out functionality and redeployed, but no luck. if i run the code directly on my local it functions as expected. we also downloaded the container image from our acr and ran it on a local docker and it also performed as expected.

robrennie commented 1 week ago

@oligarchy - June 11 2024 was the last time we saw Console messages.

I've also seen failures in starting "too many" (e.g. 40) replicas simultaneously. You'll see 401 errors in the container system log coming from the container registry - it fails because I guess the container registry can only server a certain number of replicas starting simultaneously at a time.

We're compiling Rust code into an Alpine OS to create our container app btw.

ruvintri commented 1 week ago

@robrennie Do you see 'Created container .. ' and 'Started container ..' in your queue triggered job system logs?

We are seeing a similar issue in multiple environments including production and realizing that the containers are not actually starting, hence the empty console logs.

The only difference we can see in our logs is that the 'msi-transition' image that is pulled when starting the container has rolled back from tag 1.39.26-m to 1.0.8-m

This rollback occurred in the last 24 hours has caused all event triggered jobs to fail starting.

robrennie commented 1 week ago

@ruvintri yes we do. The container is in fact running just fine - it interacts with Azure storage for example to store its results and we see that as expected. Just nothing in the Console logs since 6/12/2024. Very weird.

It seems almost like when you trigger them manually, there's some sort of context set that is not set when triggered by a queue.

vtrajan1962 commented 6 days ago

In our case, serverless container app jobs were neither writing logs nor working but execution summary shows running state. Ran for many hours with no progress. However, when I changed the queue name event is tied to and started the job manually, everything worked fine. This is a major hurdle at this point

Hope this gets patched over weekend as all our environments face this issue.

shirashka commented 4 days ago

Having the same issue. Deployed a simple container app job this morning and only the manually-triggered jobs show stdout or stderr logs.

My job doesn't (yet) contain logic. It has a few simple Python print() statements for testing purposes. It's triggering as expected based on the KEDA scaling rule for a Service Bus queue. It runs in the expected amount of time, but the output can't be found anywhere for the auto-triggered jobs.

GuiUzeda commented 4 days ago

Same here. When I start the job manually it works perfectly and finishes in success. All the log outputs works as intended. When the job is starts with the servicebus queue trigger it runs until timeout and fails with no console log. Only the system logs are shown and they report no unexpected errors except timeout.

I am actually questioning if I can trust this kind of service to deploy an application.

robrennie commented 4 days ago

@GuiUzeda - questioning the same thing. Wondering if I should just go old school with a VM for these jobs. Ugh.

GuiUzeda commented 4 days ago

@GuiUzeda - questioning the same thing. Wondering if I should just go old school with a VM for these jobs. Ugh.

It is a little bit out there. There is a lot of unanswered issues here with triage tag. I have noticed that your original report is 3 days old. I am deploying an app in development for tests so no big issues for me but if that was to happen in production we would be in big trouble here.

vinisoto commented 4 days ago

Hi, we have root-caused the missing logs to a platform regression: https://github.com/microsoft/azure-container-apps/issues/1211

robrennie commented 4 days ago

@vinisoto - #1211 did not fix this. I just ran a container app job, system log says everything was fine, no Console Log. In fact, ContainerAppConsoleLogs_CL still isn't even created.

'where' operator: Failed to resolve table or column expression named 'ContainerAppConsoleLogs_CL' Request id: 14fab04f-1582-4890-8c99-93f0ef01319a

jasonrberk commented 4 days ago

The amount of open issue, closed issues that never actually got fixed and overall horrible communication around contain apps makes me question how this made it out of preview.

The lack of communication and accountability is telling

chinadragon0515 commented 4 days ago

We have fixed the issue, a hotfix is deploying, we expect to deploy to all impacted environments within 1 day.

jasonrberk commented 4 days ago

As part of the retrospective, can anyone share where these changes are publicly discussed so we can be better prepared and have a chance at correlating platform changes to regressions. My entire non-prod set of subscriptions have been dead in the water since last Thursday

chinadragon0515 commented 4 days ago

I have patched all impacted environment, let me know anyone still meet issues.

GuiUzeda commented 3 days ago

I have patched all impacted environment, let me know anyone still meet issues.

Thanks for your answer! Any redeploy needed? I am still having issues with container not starting:

image

vtrajan1962 commented 3 days ago

Works for me

On Tue, 25 Jun 2024 at 12:21 PM, Vincent He @.***> wrote:

I have patched all impacted environment, let me know anyone still meet issues.

— Reply to this email directly, view it on GitHub https://github.com/microsoft/azure-container-apps/issues/1206#issuecomment-2188116583, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQSZYB4A6HT7V6PF47YMXDZJEHQDAVCNFSM6AAAAABJWN2BAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBYGEYTMNJYGM . You are receiving this because you commented.Message ID: @.***>

vtrajan1962 commented 3 days ago

No

On Tue, 25 Jun 2024 at 5:54 PM, Rajan VT @.***> wrote:

Works for me

On Tue, 25 Jun 2024 at 12:21 PM, Vincent He @.***> wrote:

I have patched all impacted environment, let me know anyone still meet issues.

— Reply to this email directly, view it on GitHub https://github.com/microsoft/azure-container-apps/issues/1206#issuecomment-2188116583, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQSZYB4A6HT7V6PF47YMXDZJEHQDAVCNFSM6AAAAABJWN2BAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBYGEYTMNJYGM . You are receiving this because you commented.Message ID: @.***>

jason-berk-k1x commented 3 days ago

I get no system or console logs:

Screenshot 2024-06-25 at 10 01 08 AM

Screenshot 2024-06-25 at 9 57 40 AM

Screenshot 2024-06-25 at 9 59 47 AM

shirashka commented 3 days ago

It's been resolved for my jobs. Thank you!

robrennie commented 3 days ago

Looking good for me too now. I can see Console Logs as expected. Thank you.

robrennie commented 3 days ago

@jason-berk-k1x I do notice a significant delay between the replicas completing and the Console logs appearing - fooled me a couple times. I think it's due to the verbosity of my app though - and will fix that anyways. Also, I tend to remove the filter on the default Log Analytics query and rerun it and that's usually when everything (or something) appears.

jason-berk-k1x commented 3 days ago

@robrennie did you have to recreate the job or anything?

robrennie commented 3 days ago

@jason-berk-k1x I did not, just ran it again and it worked.