unitycatalog / unitycatalog-rs

Open, Multi-modal Catalog for Data & AI, written in Rust
https://unitycatalog.io/
Apache License 2.0
52 stars 7 forks source link

[RFC]: Server architecture (gRPC?) #10

Open abrassel opened 1 week ago

abrassel commented 1 week ago

TL;DR

As we begin work on the server, we will need to settle on a high level architecture. What will the server do? What framework(s) will we use? What is the entity source of truth? ... and more.

This RFC will dedicate one section to each of these questions, and we can expand the scope as necessary.

External Requirements

MUST

SHOULD

gRPC

In addition to these external requirements, I am proposing for our implementation to expose a mirror set of gRPC endpoints as well. This, combined with a protobuf spec, will have a number of performances

Object Model

Currently, we have code-genned a set of Rust objects to represent the Unity Catalog object model. Instead, if we use protobuf, I am proposing we also generate our Rust object model from the protobuf generation. This comes with a number of benefits:

Server architecture

If we pursue a dual gRPC and HTTP server, this guide seems like a decent model to follow.

TL;DR it proposes using axum, hyper, tonic, tower.

We can furthermore use prost to generate our Rust types and utoipa for our openAPI and swagger spec. It is worth noting that we will need to do additional work and possibly an upstream contribution to properly support the application/x-ndjson specification. See this issue for some context.

abhiaagarwal commented 1 week ago

This sounds like a great plan @abrassel! A couple questions I have:

In addition, I've used that guide before for personal use and it's a bit out of date for modern axum + tonic versions, but I think there's been a good amount of progress on that end. https://github.com/tokio-rs/axum/issues/2736

ognis1205 commented 1 week ago

@abrassel @abhiaagarwal

Sorry for jumping into the conversation. Regarding application/x-ndjson, if I understand correctly, Unity Catalog will eventually support the Delta Sharing protocol as well. At that point, application/x-ndjson will be relevant when implementing the following protocol specification:

abhiaagarwal commented 1 week ago

Hey @ognis1205,

First of all, don't apologize! This is a "Request for Comments", every and all comments are appreciated :D

Second of all, I just looked at the Delta Sharing protocol and it looks relatively trivial to implement (in fact, I see on your profile that you have a delta-sharing-rs server implementation, we can likely just directly leverage that and nest it under the main router) — my only question is, where did you get the information that Unity will support Delta Sharing? I was under the impression that DBX provides its own properitary server implementation that interfaces with Unity, but it's not necessarily a built-in feature of the catalog itself. I'm not aware of the internals.

ognis1205 commented 1 week ago

@abhiaagarwal

Thank you for the reply and your understanding. Regarding the main router, yes, I thought the same way as you did. The reason I believe Unity will support Delta Sharing is due to the following comment and the resource:

As you mentioned, just from the roadmap and her statement, it might still be unclear how they plan to support the Delta Sharing protocol.

abhiaagarwal commented 1 week ago

@ognis1205 ty so much for the links! You're indeed right. At the end of the day, the "unity catalog protocol" is basically an access-control server for assets scoped like a database (all it does is hand out leases to assets living in cloud storage), delta sharing is basically the same thing without the multimodality and some parquet-specific optimizations (like data skipping). I guess we can say that the unity catalog is meant to represent an evolution of delta-sharing (while delta-sharing is a bit more stateful, unity catalog is theoretically agnostic to the underlying data asset).

That is to say, if the unity catalog is a more generalized form of delta-sharing (which I currently believe it to be), then nesting a router under the main unity catalog router is probably trivial depending on the backend.

I don't know in all honesty, but anyways, I just discovered axum-extras supports ndjson, so it's kind of a moot point anyways :)

amogh-jahagirdar commented 5 days ago

One requirement I'd like to advocate for is support for the Iceberg REST catalog, like what's being worked in https://github.com/unitycatalog/unitycatalog ! I'd be happy to help with any efforts in that area.

abrassel commented 4 days ago

Thanks @amogh-jahagirdar ! That's a great suggestion. I agree that we should definitely prioritize that super useful feature.

It may be slightly outside the scope of this RFC, since here we're focused on the broad capabilities and architecture - i.e. are we exposing gRPC endpoints, rather than which API endpoints.

That being said, It would be great if you could submit an RFC explicitly asking for Iceberg support! I don't think it'll be controversial :)

abhiaagarwal commented 2 days ago

Final thoughts from anyone in this thread?

Personally, I am inclined towards gRPC and Protobuf definitions, but at least for the time being, I want to focus on the REST implementation first and retrofit it later. I've spent some time trying to get the new axum and tonic working and it's quite challenging, I worry that gRPC will block us.

abrassel commented 6 hours ago

sounds great to me! I think lets consider this RFC closed. I'll be approaching this from the client side and we can use swagger to generate a rust client from the openapi spec.