mrchristine / db-migration

Databricks Migration Tools
Other
43 stars 27 forks source link

Import metastore extract onto external SQL server. #13

Closed raknaik closed 4 years ago

raknaik commented 4 years ago

Hi Team,

Kindly let us know if there is a way to dump the extracted metadata of Databricks workspace onto the SQL data base.

Is there any utility just like how we imported and exported the metastore using python script?

mrchristine commented 4 years ago

What specific metadata of a Databricks workspace are you looking for? There's many areas of a Databricks workspace, which this tools exports via supported APIs.

raknaik commented 4 years ago

Hi Christine,

Previously using the utility updated in the GIT I was able to import the Metastore information which consisted of Table DDL’s.

Now I am looking for a way to dump this Metastore onto our SQL DB.

Is there a way to achieve this?

Regards,

Rakesh.N

From: Miklos C notifications@github.com Sent: Tuesday, June 23, 2020 9:21 PM To: mrchristine/db-migration db-migration@noreply.github.com Cc: Naik, Rakesh (623-Extern-Capgemini) capgemini.naik@daimler.com; Author author@noreply.github.com Subject: Re: [mrchristine/db-migration] Import metastore extract onto external SQL server. (#13)

What specific metadata of a Databricks workspace are you looking for? There's many areas of a Databricks workspace, which this tools exports via supported APIs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mrchristine/db-migration/issues/13#issuecomment-648252030, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APLKTRYGVQF2QVKWSW54FA3RYDFPXANCNFSM4OFRXQUA.

If you are not the addressee, please inform us immediately that you have received this e-mail by mistake, and delete it. We thank you for your support.

mrchristine commented 4 years ago

Are you looking to recreate a metastore backed by SQL DB, or are you looking to build a metadata service that you can query via SQL DB?

The tool uses the SparkSQL client to pull metadata info from the HiveMetastoreClient class, and is meant to interface via Spark. There's no direct way to import the metastore into another backing database. The best way to do that would be set up another Spark cluster to that metastore, and import the tables specifically to that cluster.

I could add an option to the import phase to attach to a specific cluster name if provided, but you would need to ensure the metastore is provisioned correctly.

raknaik commented 4 years ago

Hi Christine,

I could add an option to the import phase to attach to a specific cluster name if provided, but you would need to ensure the metastore is provisioned correctly. – Please add this feature to the script.

We already have a cluster instance configured with the necessary Spark configurations with Connection to SQL DB onto which we want to dump the metastore copy.

Regards,

Rakesh.N

From: Miklos C notifications@github.com Sent: Wednesday, June 24, 2020 8:44 PM To: mrchristine/db-migration db-migration@noreply.github.com Cc: Naik, Rakesh (623-Extern-Capgemini) capgemini.naik@daimler.com; Author author@noreply.github.com Subject: Re: [mrchristine/db-migration] Import metastore extract onto external SQL server. (#13)

Are you looking to recreate a metastore backed by SQL DB, or are you looking to build a metadata service that you can query via SQL DB?

The tool uses the SparkSQL client to pull metadata info from the HiveMetastoreClient class, and is meant to interface via Spark. There's no direct way to import the metastore into another backing database. The best way to do that would be set up another Spark cluster to that metastore, and import the tables specifically to that cluster.

I could add an option to the import phase to attach to a specific cluster name if provided, but you would need to ensure the metastore is provisioned correctly.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mrchristine/db-migration/issues/13#issuecomment-648883262, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APLKTRZZCDIM5U5MBXPRTMLRYIJ5LANCNFSM4OFRXQUA.

If you are not the addressee, please inform us immediately that you have received this e-mail by mistake, and delete it. We thank you for your support.

mrchristine commented 4 years ago

I'll take some time to add it next week. Thanks for the feedback.

mrchristine commented 4 years ago

@raknaik I've added support for --cluster-name as an import option so you can connect to a separate cluster and run the import.

Please test it out and let me know how it works for you. I'll close this issue once you report back.

arjun-hareendran commented 4 years ago

@mrchristine : Can we use the same parameter --cluster-name while we export the metadata ? (ddl's) . As of now a cluster named API_Metastore_Work_Leave_Me_Alone gets created which does the process. If we can use --cluster-name , then i would prefer to use an existing interactive cluster to do the job for me.

mrchristine commented 4 years ago

@arjun-hareendran the --cluster-name export metadata now works.