microsoft / Purview-ADB-Lineage-Solution-Accelerator

A connector to ingest Azure Databricks lineage into Microsoft Purview
MIT License
90 stars 55 forks source link

Demo Install - DB compute setup requires addition steps #206

Open batemansogq opened 1 year ago

batemansogq commented 1 year ago

Describe the bug The Demo install process requires additional steps to get running

To Reproduce Steps to reproduce the behavior:

  1. Update the Settings.sh to include a unique Purview name (or else hit the already exists issue)
  2. Run the Demo installation sh
  3. The DB cluster fails to start with an init script error - not found

Steps to fix:

  1. Update the Settings.sh to include a unique Purview name (or else hit the already exists issue)
  2. Run the Demo installation sh
  3. Using the DB CLI - upload the init script and jar (as per the connector instructions - https://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/blob/main/deploy-base.md#install-openlineage-on-your-databricks-cluster)
  4. Update DB cluster Libraries to include the Maven - mssql-connector
  5. update the DB "abfss-in-abfss-out-sample" - storageServiceName & storageContainerName values as the references dont work within the current set

Expected behavior A clear and concise description of what you expected to happen.

The Demo installation should work as per the instructions.

Logs

  1. Please include any Spark code being ran that generates this error

https://gist.github.com/batemansogq/c29c2bcfb04b3e966fad5ac4648feb1d = Spark config

  1. Please include a gist to the OpenLineageIn and PurviewOut logs
  2. See how to stream Azure Function Logs

Screenshots If applicable, add screenshots to help explain your problem.

demo install - DB failure image

demo install - settings update image

demo install - spark config image

Desktop (please complete the following information): - I have the standard MS build

OS: [e.g. Windows, Mac] OpenLineage Version: [e.g. name of jar] Databricks Runtime Version: [e.g. 9.1, 10.1, 11.3] Cluster Type: [e.g. Job, Interactive] Cluster Mode: [e.g. Standard, High Concurrency, Single] Using Credential Passthrough: [e.g. Yes, No] Additional context Add any other context about the problem here.

This work has been completed in the MS non-prod tenancy, reach out to me via email for access.

wjohnson commented 1 year ago

Hi, @batemansogq - Thank you for using the solution accelerator! Would you help me understand what you believe fixed your cluster start issue?

It looks like you took three steps:

  1. Added purviewName to the settings.sh
  2. Update DB cluster Libraries to include the Maven - mssql-connector
  3. update the DB "abfss-in-abfss-out-sample" - storageServiceName & storageContainerName values as the references dont work within the current set

However, I'm not certain how these would have affected your cluster initialization.

Thank you for any additional feedback.

batemansogq commented 1 year ago

Hey Will,

This covers all of the steps I needed to do, to complete the install and get the cluster running & DB notebool running - asides from uploading the files to DBFS. But as you say, these arent directly related to getting the DB cluster running

  1. Added purviewName to the settings.sh = This was needed to get the install of purview, otherwise it kept failing from name exists errors not matter what region I tried
  2. Update DB cluster Libraries to include the Maven - mssql-connector = needed to match the install script (cluster ran without this)
  3. update the DB "abfss-in-abfss-out-sample" = needed to run the notebook (cluster ran without this)

Apologies, I was trying to get a single issue to cover all of the instruction updates required, rather than raising 4 seperate issues (=lazy)

Regards Scott


From: Will Johnson @.> Sent: Thursday, 8 June 2023 11:06 PM To: microsoft/Purview-ADB-Lineage-Solution-Accelerator @.> Cc: batemansogq @.>; Mention @.> Subject: Re: [microsoft/Purview-ADB-Lineage-Solution-Accelerator] Demo Install - DB compute setup requires addition steps (Issue #206)

Hi, @batemansogqhttps://github.com/batemansogq - Thank you for using the solution accelerator! Would you help me understand what you believe fixed your cluster start issue?

It looks like you took three steps:

  1. Added purviewName to the settings.sh
  2. Update DB cluster Libraries to include the Maven - mssql-connector
  3. update the DB "abfss-in-abfss-out-sample" - storageServiceName & storageContainerName values as the references dont work within the current set

However, I'm not certain how these would have affected your cluster initialization.

Thank you for any additional feedback.

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/Purview-ADB-Lineage-Solution-Accelerator/issues/206#issuecomment-1582548532, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADXMAAUHJ7GGPL3G3LYGIXDXKHE7DANCNFSM6AAAAAAY5E44JE. You are receiving this because you were mentioned.Message ID: @.***>