open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.65k stars 1.06k forks source link

Dremio datalake engine support #15644

Open capoolebugchat opened 8 months ago

capoolebugchat commented 8 months ago

I'm finding a way for the OMD project to rule over every tool's metadata and sort of monitors an extra-compact DataPlatform, this DP uses Dremio as its query execution engine for scalability with bigger datasets and ease of use (it connects well to a lot of data sources). However, OMD hasn't the connector to Dremio for metadata extract and monitoring.

Solution: A Dremio connector to OMD, which can be easily configured through minimal variables like host:port and usrn:pasw, ssl is a nice addon feature but is not essential for now.

Alternative: An external data cataloguing service like HiveDC, DynamoDB, Nessie (Dremio recommended),... Both OMD and Dremio uses this as Metadata monitor and tracking tool. However this exclude Dremio from OMD and bloats the infra a bit (another data solution to take care of).

I'm new to this Cloud Data Engineering thing, a bit suprised about how limited Dremio is, though the engine is still quite powerful.

TeddyCr commented 8 months ago

@capoolebugchat are you interested in picking up this issue?

rogercezidio commented 7 months ago

@TeddyCr I would love to help.

capoolebugchat commented 7 months ago

@TeddyCr yes, sorry about the extra late reply

TeddyCr commented 7 months ago

Thanks @capoolebugchat I'll assign it to you then. We have some information about how to build a new connector here. Make sure to join our slack channel and the #contributor channel for any help.

@rogercezidio please check other connectors here for contributing we have many. 😊

wobu commented 3 months ago

Hi folks,

we did a simple implementation for a Dremio custom connector here: https://github.com/TIKI-Institut/openmetadata-dremio-connector. It can only scrap Metadata. It has no support for Query Usage, Profiling etc. We didn't find a possibility to implement that for a custom connector. For lineage we are simple using DBT at the moment.

We would appreciate any feedback. Is it possible that this will be integrated into OpenMetadata?

TeddyCr commented 2 months ago

Hey @wobu we would recommend you to directly contribute the connector to the community. This will allow you to leverage the existing code base to implement support for Usage, Profiling, etc.

Here is a link with more information -> https://docs.open-metadata.org/latest/developers/contribute/developing-a-new-connector

wobu commented 2 months ago

We thought about it, and also tried it, but unfortunatley setting up the openmetadata project under windows OS wasn't easy :/ (WSL would maybe an option). So we decided to just start with a custom connector.

The CustomConnector is currently sufficient for us, so we won't provide a direct community integration in the near future until our investment in Openmetadata and Dremio increases.