microsoft / Purview-ADB-Lineage-Solution-Accelerator

A connector to ingest Azure Databricks lineage into Microsoft Purview
MIT License
90 stars 55 forks source link

Connector install - OpenLineage does not Initialize #218

Open yxu1183 opened 9 months ago

yxu1183 commented 9 months ago

Describe the bug When the DB compute cluster starts, OpenLineage doesn't initialize as expected, and no events are produced.

To Reproduce Steps to reproduce the behavior:

  1. Follow the Connector installation and post installation instructions.
  2. Upload the OpenLineage Jar into DBFS.
  3. Configured the all compute cluster (single user mode) with spark configuration.
  4. Uploaded the open lineage init script into the user workspace directory.
  5. Updated the init script path in the cluster with workspace as source and file path.
  6. Start the compute cluster.

Expected behavior By the instructions - https://github.com/OpenLineage/OpenLineage/tree/main/integration/spark/databricks#initialization-logs - we should see 3 log entries for intiatlization

"Registered listener io.openlineage.." - this one appears "OpenLineageContext: Init OpenLineageContext:" - this one is missing "AsyncEventQueue: Process of event SparkListenerApplicationStart" - this one appears

Logs

  1. Please include any Spark code being ran that generates this error spark.openlineage.version v1 spark.openlineage.namespace <adb_workspace_id>#<cluster_id> spark.openlineage.host https://<functionapp_name>.azurewebsites.net/ spark.openlineage.url.param.code {{secrets/<scope>/<function_defualt_key}}

  2. Init Scripts Path updated in the cluster: Type: Workspace File path: /Users/user_name/init_scripts/open-lineage-init-script.sh

Open Lineage script's absolute path - /Users/user_name/init_scripts/open-lineage-init-script.sh

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

wjohnson commented 6 months ago

@yxu1183 did you attempt to run a notebook and it failed to produce lineage?

If you're seeing the AsyncEventQueue: Process of event SparkListenerApplicationStart you should be able to receive lineage events!