microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
372 stars 29 forks source link

Some records from SystemLog are missing #1276

Closed koldyba closed 2 months ago

koldyba commented 2 months ago

Please provide us with the following information:

This issue is a: (mark with an x)

Issue description

I noticed that some system logs are missing in ContainerAppSystemLogs_CL table when I was trying to understand what's going on with my application. It looks like sometimes not all the events are logged properly.

Expected behavior [What you expected to happen.] All system logs can be found in ContainerAppSystemLogs_CL table

Actual behavior [What actually happened.] Some system logs are missing On the attached screenshots you can see that ReplicaUnhealthy events from 9 to 16 and from 16 to 22 are missing. Also, StoppingContainer events from 3 to 8 are missing as well.

Screenshots
image

simonjj commented 2 months ago

@koldyba thanks for reaching out. Are you suggesting that the count delta means that log messages are being lost? Are there other symptoms you're experiencing? The sequencing of these messages is not always reliable and hence repeatable so I'd hesitate to come to the conclusion that logs are being lost based on this alone. Are you experiencing other issues?

koldyba commented 2 months ago

@simonjj I assume so.

If you take my screenshot as example, you'll see that ContainerApp was rebooted twice in the timeframe 11:30 - 11:45. And there's no logs about why it happened - I can only assume that it was rebooted by failed liveness probe because of the gap in sequence that matches possible events perfectly. If it does not mean the logs are missing than what is it?

I haven't seen other "lost" logs yet.

If system logs are unreliable, do you have any suggestions how to monitor and investigate events from underlying infra(k8s, keda scaler, etc)?

simonjj commented 2 months ago

We're adding this to our backlog to investigate more closely. Can you please share your details on this app with us via acasupport.

microsoft-github-policy-service[bot] commented 2 months ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.