Open nixent opened 5 months ago
There's not yet full public documentation for the Livy endpoint, but much of the functionality is the same as what is documented for Synapse. The endpoints are different, but they give a good idea of what Livy's functionality is.
Is there a particular thing you want to do?
@jcvdodson I'm looking for details about endpoints. lakehouse_endpoint suggests that lakehouseid is part of endpoint base url and then it is used in sessions endpoint for instance. My understanding is that session
belongs to workspace
and not to lakehouse
since lakehouse is equivalent to database as Spark has 2 tier namespaces database_name.table_name
def lakehouse_endpoint(self) -> str:
# TODO: Construct Endpoint of the lakehouse from the
return f'{self.endpoint}/workspaces/{self.workspaceid}/lakehouses/{self.lakehouseid}/livyapi/versions/2023-12-01'
I'd like to understand how access to tables in different lakeshouses is managed via Livy endpoint.
You are correct that a Livy session belongs to a particular workspace. Livy itself doesn't really manage access to tables -- it just executes the SQL statements. A Livy session is considered "hosted" by a specific Lakehouse within the workspace, but as you can see in the submitLivyCode
method, we don't actually submit statements against that Lakehouse, just to the Livy session.
You can execute statements against tables in different LH by prefacing the table name with the LH name, as you have[LH_name].[table_name]
, in the statement.
Thank you for the clarification. In case I have multiple lakehouses which one you'd recommend setting in the dbt profile?
I think they would all be equivalent. To be transparent, though, I haven't tested on large lakehouses. My inclination is to say that the profiles.yml lakehouse should be the largest lakehouse or the one you expect to interact with most, but I'm not sure that this really has any practical effect.
Could you provide clarification on why the Livy endpoint is integrated within the lakehouse, whereas other endpoints are located under the workspace? Are there specific design aspects or considerations that users should be aware of in this context?
The hosting artifact (the Lakehouse) is there for internal reasons related to authentication flows. In practical effect (i.e. statement execution), Livy is tied to the workspace. The hosting artifact has no impact on statement execution -- so, there aren't any special considerations you need to be aware of when using Livy across Lakehouses within a Workspace.
Is there any documentation for the Livy endpoint, something similar to one for Synapse?
Fabric REST APi describes other elements of Fabric but not Livy.