microsoft / dbt-fabricspark

MIT License
7 stars 3 forks source link

Documentation for Livy REST API #18

Open nixent opened 1 month ago

nixent commented 1 month ago

Is there any documentation for the Livy endpoint, something similar to one for Synapse?

Fabric REST APi describes other elements of Fabric but not Livy.

jcvdodson commented 4 weeks ago

There's not yet full public documentation for the Livy endpoint, but much of the functionality is the same as what is documented for Synapse. The endpoints are different, but they give a good idea of what Livy's functionality is.

Is there a particular thing you want to do?

nixent commented 3 weeks ago

@jcvdodson I'm looking for details about endpoints. lakehouse_endpoint suggests that lakehouseid is part of endpoint base url and then it is used in sessions endpoint for instance. My understanding is that session belongs to workspace and not to lakehouse since lakehouse is equivalent to database as Spark has 2 tier namespaces database_name.table_name

    def lakehouse_endpoint(self) -> str:
        # TODO: Construct Endpoint of the lakehouse from the 
        return f'{self.endpoint}/workspaces/{self.workspaceid}/lakehouses/{self.lakehouseid}/livyapi/versions/2023-12-01'

I'd like to understand how access to tables in different lakeshouses is managed via Livy endpoint.

jcvdodson commented 3 weeks ago

You are correct that a Livy session belongs to a particular workspace. Livy itself doesn't really manage access to tables -- it just executes the SQL statements. A Livy session is considered "hosted" by a specific Lakehouse within the workspace, but as you can see in the submitLivyCode method, we don't actually submit statements against that Lakehouse, just to the Livy session.

You can execute statements against tables in different LH by prefacing the table name with the LH name, as you have[LH_name].[table_name], in the statement.

nixent commented 3 weeks ago

Thank you for the clarification. In case I have multiple lakehouses which one you'd recommend setting in the dbt profile?

jcvdodson commented 3 weeks ago

I think they would all be equivalent. To be transparent, though, I haven't tested on large lakehouses. My inclination is to say that the profiles.yml lakehouse should be the largest lakehouse or the one you expect to interact with most, but I'm not sure that this really has any practical effect.

nixent commented 3 weeks ago

Could you provide clarification on why the Livy endpoint is integrated within the lakehouse, whereas other endpoints are located under the workspace? Are there specific design aspects or considerations that users should be aware of in this context?

jcvdodson commented 3 weeks ago

The hosting artifact (the Lakehouse) is there for internal reasons related to authentication flows. In practical effect (i.e. statement execution), Livy is tied to the workspace. The hosting artifact has no impact on statement execution -- so, there aren't any special considerations you need to be aware of when using Livy across Lakehouses within a Workspace.