Semantic Link - Labs - get_lakehouse_tables extended and count_rows not working when defining the lakehouse manually

MASFelixPBI commented 2 months ago

Hello to all,

I'm trying to get the information for the guardrails of direct lake for a set of specific lakehouse the idea is to have a report that will allow us to follow the guardrails when the data get's refreshed.

I have made the connection to two lakehouse in the same workspace in the notebook and used the following code to get the lakehouse information:

labs.lakehouse.get_lakehouse_tables(lakehouse= None, workspace=None, extended=True, count_rows=True)

This get's me the result for the default lakehouse on that notebook and everything is well, however when I change the code to get the the other lakehouse making use of the lakehouse= property I get an error refering that a table is not present in the lakehouse (see code below).

labs.lakehouse.get_lakehouse_tables(lakehouse= "YellowCabCompany", workspace="Direct Lake 2024", extended=True, count_rows=True)

I looked at the error code and the table being returned are the ones for the default lakehouse so I understand the error, but believe since I have added the lakehouse and workspace the error should not appear.

However if I take out the guardrails options (extended=True, count_rows=True) I'm able to get the information for the tables in the second lakehouse even if it's not the default one.

It's seems to me that the extended property and count_rows is not getting the correct definitions when we define a specific lakehouse and always is getting the information for the default lakehouse on the notebook.

The main idea is to get the guardrails for the lakehouse using parameter that can get me the information stored and then create the report.

m-kovalsky commented 2 months ago

This is because you cannot run cross-workspace spark queries. You must set the other lakehouse as the default lakehouse and then run the function again. I will update this function to give an instructive error for this situation.

MASFelixPBI commented 2 months ago

Follow up question:

How can I change the default lakehouse programatically to use it on a pipeline for example?

I have tried the following code:

%%configure -f { "defaultLakehouse": { "name": { "parameterName": "LakehouseNaming", "defaultValue": "YellowCabCompany" } } }

This works correcly if I use it in the notebook and I can change the default workspace, but in the pipeline gives an error of Line magic function %%configure .

Do you know any workaround that does not goes for creating a notebook for each of the lakehouses or running it manually?

Thank you.

m-kovalsky commented 2 months ago

I have to learn more about this but I believe you would need to update the notebook definition (including the default lakehouse).

https://learn.microsoft.com/en-us/fabric/data-engineering/notebook-source-control-deployment

MASFelixPBI commented 2 months ago

Thank you,

Also was looking around that but appreciate your answer.

Amazing job on this feature, kudos to you and the rest of the team.

m-kovalsky commented 1 month ago

Resolved in 0.7.0.

microsoft / semantic-link-labs

Semantic Link - Labs - get_lakehouse_tables extended and count_rows not working when defining the lakehouse manually #51