Closed matheusmaltap closed 1 year ago
From support:
In Power BI a Dashboard is a set of visual elements that are assembled from Reports. Reports are packaged into
.PBIX files. Some people build their Dataset - data model (Tabular Object Model) - in the report .PBIX file,
best practice is to create a separate report .PBIX file and data model .PBIX file. Reports in Power BI can be
used to create dashboards with interactive visual elements similar to a Dashboard in Tableau. Most people would be
building Reports, not Dashboards.
Other entities of interest are Dataflows which are cloud-based ETL pipelines, Datamarts, Paginated Reports
(which are like Sequel Server Reporting Services [SSRS]), and Workspaces.
Workspace --> Workspace
Dashboard --> Report
Report --> Dataset
Dataset --> Dataset (i.e. composite models)
Dataset --> Dataflow
Dataflow --> Dataflow (chaining a sequence of transformations)
Dataset --> 10+ data sources
Dataflow --> 10+ data sources
These are the types of relationships I have seen after building data lineage from Power BI REST-APIs called the Scanner APIs.
Is there some think I can maybe do to enable this? We use the API for Dataset and Report Lineage, I don't know if you can see the lineage of a Power BI dataset for example.
Hi @Daandamhuis I am not sure I followed your comment. Could you please give some more details on what you are currently using and what you'd like to see in OM? Thanks
Hi,
Sorry, maybe I wasn't clear enough. We are currently expirementing with OM and I'm really like the SQL Server connector (better then Purview).
The next step for us is to also add in the Power BI and Analysis Services (Tabular) models.
I'm currently working on PyTabular, an package in Python that can read all types of metadata form the dataset and generate docs from these files (Markdown currently). This could be extended so we add that information to OM and track the lineage from the Database to the Dataset and hopefully to the report and dashboard.
Also Power BI just enable the "Scanner API", which allows for more details about models store in the Power BI workspaces.
Things to implement.
I was the person who posted the note in #support about the Power BI metadata spec. How can we help develop a connector that supports all the entities described in the support note? I'd like to have the ability for an ingestion connector that can work with JSON or CSV metadata. This is because the Power BI connector that is available requires Service Principal access, and most companies will not provide Service Principal access to anyone who is not a Power BI Admin. However, Power BI does allow us to use the Scanner API via Powershell cmdlets using our own credentials. In this case the user will only be able to extract metadata about workspaces and related entities that they own. It is easy to save such output as CSV or JSON and construct a JSON schema. If we have a connector that can take in a JSON file that way many more people will be able to ingest Power BI metadata into Open Metadata even if they are not Power BI admins.
Thanks for the details @Daandamhuis. @nahuelverdugo this might be something we could use for the PBI Models.
@Daandamhuis to give some details, we're working on adding more details on the models created in dashboard services, and how they relate to source databases and the dashboards they create.
@salilcbi, we'll be exploring how to extract PBI metadata without admin access cc @OnkarVO7
As per your example on how to extract custom PBI metadata in a csv/json and then push it to OM, creating a Custom Connector might be the easiest way to go for this https://www.youtube.com/watch?v=fDUj30Ub9VE&t=5s&ab_channel=OpenMetadata. In any case, this is something we'll be analyzing to make the current connector more flexible.
@pmbrull I've managed to extract the info from a Power BI dataset and push it OMD via the Python SDK and a Custom connector.
I waiting on the Data Model entity to implement it further because currently it's a Database Entity.
FYI, with PyTabular you can extract data without admin access, but then you can't run it unattended. We use Service Principals.
I also listed the options here, maybe it easier to close that one. #10689
@Daandamhuis - would you be able to share any details about your custom connector to help others jumpstart their connectors and so perhaps we could collaborate to create a single connector. I'm just getting started and using OMD via Docker for local development. If you could share details about where you chose to upload the data files from the tenant scanner api output, any transformations you had to do and any gotchas it would be very much appreciated. Code fragments would be outstanding as well.
Hi,
Sorry for the delay, but here is the gist. What it does;
Measures can be extract similarly from Tables and Columns.
With 'PyTabular' it's also possible to extract data for profiling purposes, but that takes a bit more time to extract, because then your are actually querying the model.
https://gist.github.com/Daandamhuis/0f437db170f57d0396ca042a186888aa
Closing this as duplicate of https://github.com/open-metadata/OpenMetadata/issues/10689. Thanks
Is your feature request related to a problem? Please describe. https://openmetadata.slack.com/team/U04FS3RM88G
Describe the solution you'd like We would like to be able to connect to our datasets and datamarts in Power BI
Describe alternatives you've considered develop something through the Api's
Additional context