mspnp / spark-monitoring

Monitoring Azure Databricks jobs
MIT License
212 stars 177 forks source link

Issue with Logging Spark Events to azure Log Analytics after upgrading to databricks 11.3 LTS #223

Closed ravikumashi closed 11 months ago

ravikumashi commented 1 year ago

We have recently been in the process of upgrading our Databricks clusters to version 11.3 LTS. As part of this upgrade, we have been working on integrating the logging of Spark events to LogAnalytics using the repository available at https://github.com/mspnp/spark-monitoring/tree/l4jv2.

However, we've encountered an issue where certain pieces of information that were previously available in azure log analytics table SparkLoggingEvent_CL in LTS 10.4 seem to be missing in the 11.3 LTS version. One specific example of this issue is related to the "Message" field. In LTS 10.4, the "Message" field used to contain driver log messages, but in our current setup with 11.3 LTS, this field appears to be populated with spaces, and none of the driver log messages are appearing in this field.

We are reaching out to kindly request your assistance in resolving this matter. It's crucial for us to accurately capture and analyze these driver log messages as they provide valuable insights into the behavior of our Spark applications. Could you please guide us on how to ensure that the driver log messages are correctly being logged and reflected in the "Message" field as they were in the previous version?

Any insights, recommendations, or troubleshooting steps you could provide would be greatly appreciated. We understand your expertise in this area and believe that your guidance will significantly help us in overcoming this challenge.

hallihan commented 11 months ago

Please reach out to the contact listed in the README if you still need assistance. https://github.com/mspnp/spark-monitoring/tree/main#monitoring-azure-databricks-in-an-azure-log-analytics-workspace

ravikumashi commented 11 months ago

Please reach out to the contact listed in the README if you still need assistance. https://github.com/mspnp/spark-monitoring/tree/main#monitoring-azure-databricks-in-an-azure-log-analytics-workspace

Thank you for your response, tried sending email to the contact mentioned in README and no response Basically we tried to debug the code log4jAppender and see messages are coming in the same format as in sparkLayout.json mentioned in spark-monitoring.sh script but for some reason not making to log analytics table SparkLoggingEvent_CL also we don't any errors in cluster driver logs.

ravikumashi commented 11 months ago

This is resolved, we were confussed as clusterId and clusterName were comming in as null and the usual practise we follow to query the log analytics is fliter by clusterName/clusterId. after we fix the clusterId and clusterNames it all looks good now.