open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.54k stars 1.05k forks source link

Power BI Datamart Connect #9772

Closed matheusmaltap closed 1 year ago

matheusmaltap commented 1 year ago

Is your feature request related to a problem? Please describe. https://openmetadata.slack.com/team/U04FS3RM88G

Describe the solution you'd like We would like to be able to connect to our datasets and datamarts in Power BI

Describe alternatives you've considered develop something through the Api's

Additional context

pmbrull commented 1 year ago

related to https://github.com/open-metadata/OpenMetadata/issues/5711

pmbrull commented 1 year ago

From support:

 In Power BI a Dashboard is a set of visual elements that are assembled from Reports. Reports are packaged into 
.PBIX files. Some people build their Dataset - data model (Tabular Object Model) - in the report .PBIX file, 
best practice is to create a separate report .PBIX file and data model .PBIX file. Reports in Power BI can be 
used to create dashboards with interactive visual elements similar to a Dashboard in Tableau. Most people would be 
building Reports, not Dashboards.

Other entities of interest are Dataflows which are cloud-based ETL pipelines, Datamarts, Paginated Reports 
(which are like Sequel Server Reporting Services [SSRS]), and Workspaces.

Workspace --> Workspace
Dashboard --> Report
Report --> Dataset
Dataset --> Dataset (i.e. composite models)
Dataset --> Dataflow
Dataflow --> Dataflow (chaining a sequence of transformations)
Dataset --> 10+ data sources
Dataflow --> 10+ data sources

These are the types of relationships I have seen after building data lineage from Power BI REST-APIs called the Scanner APIs.
Daandamhuis commented 1 year ago

Is there some think I can maybe do to enable this? We use the API for Dataset and Report Lineage, I don't know if you can see the lineage of a Power BI dataset for example.

pmbrull commented 1 year ago

Hi @Daandamhuis I am not sure I followed your comment. Could you please give some more details on what you are currently using and what you'd like to see in OM? Thanks

Daandamhuis commented 1 year ago

Hi,

Sorry, maybe I wasn't clear enough. We are currently expirementing with OM and I'm really like the SQL Server connector (better then Purview).

The next step for us is to also add in the Power BI and Analysis Services (Tabular) models.

I'm currently working on PyTabular, an package in Python that can read all types of metadata form the dataset and generate docs from these files (Markdown currently). This could be extended so we add that information to OM and track the lineage from the Database to the Dataset and hopefully to the report and dashboard.

Also Power BI just enable the "Scanner API", which allows for more details about models store in the Power BI workspaces.

Things to implement.

salilcbi commented 1 year ago

I was the person who posted the note in #support about the Power BI metadata spec. How can we help develop a connector that supports all the entities described in the support note? I'd like to have the ability for an ingestion connector that can work with JSON or CSV metadata. This is because the Power BI connector that is available requires Service Principal access, and most companies will not provide Service Principal access to anyone who is not a Power BI Admin. However, Power BI does allow us to use the Scanner API via Powershell cmdlets using our own credentials. In this case the user will only be able to extract metadata about workspaces and related entities that they own. It is easy to save such output as CSV or JSON and construct a JSON schema. If we have a connector that can take in a JSON file that way many more people will be able to ingest Power BI metadata into Open Metadata even if they are not Power BI admins.

pmbrull commented 1 year ago

Thanks for the details @Daandamhuis. @nahuelverdugo this might be something we could use for the PBI Models.

@Daandamhuis to give some details, we're working on adding more details on the models created in dashboard services, and how they relate to source databases and the dashboards they create.

@salilcbi, we'll be exploring how to extract PBI metadata without admin access cc @OnkarVO7

As per your example on how to extract custom PBI metadata in a csv/json and then push it to OM, creating a Custom Connector might be the easiest way to go for this https://www.youtube.com/watch?v=fDUj30Ub9VE&t=5s&ab_channel=OpenMetadata. In any case, this is something we'll be analyzing to make the current connector more flexible.

Daandamhuis commented 1 year ago

@pmbrull I've managed to extract the info from a Power BI dataset and push it OMD via the Python SDK and a Custom connector.

I waiting on the Data Model entity to implement it further because currently it's a Database Entity.

FYI, with PyTabular you can extract data without admin access, but then you can't run it unattended. We use Service Principals.

I also listed the options here, maybe it easier to close that one. #10689

salilcbi commented 1 year ago

@Daandamhuis - would you be able to share any details about your custom connector to help others jumpstart their connectors and so perhaps we could collaborate to create a single connector. I'm just getting started and using OMD via Docker for local development. If you could share details about where you chose to upload the data files from the tenant scanner api output, any transformations you had to do and any gotchas it would be very much appreciated. Code fragments would be outstanding as well.

Daandamhuis commented 1 year ago

Hi,

Sorry for the delay, but here is the gist. What it does;

  1. Connect to OMD
  2. Connect to the Power BI Workspace and the Dataset via Azure Service Principals.
  3. Register a "database" service, because I don't have access to the dataset entity.
  4. Extract Meta Data from the Model and register the tables in OMD
  5. Create Lineage to (in our case) the datamart in the datawarehouse.

Measures can be extract similarly from Tables and Columns.

With 'PyTabular' it's also possible to extract data for profiling purposes, but that takes a bit more time to extract, because then your are actually querying the model.

https://gist.github.com/Daandamhuis/0f437db170f57d0396ca042a186888aa

pmbrull commented 1 year ago

Closing this as duplicate of https://github.com/open-metadata/OpenMetadata/issues/10689. Thanks