projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics
https://projectnessie.org
Apache License 2.0
1.04k stars 129 forks source link

Multiple repository on same Nessie instance #5342

Open tmnd1991 opened 2 years ago

tmnd1991 commented 2 years ago

Description: as of now a Nessie instance/server can handle 1 repository. Therefore it forces to handle the datalake as a monorepo. I see use cases (data mesh) where such thing is not desired and would like to be able to handle groups of tables as different repositories, so that their “log” do not intertwine. Right now I’m forced to spin multiple Nessie servers, but I think this could be supported natively by Nessie.

Requested Changes in public API:

Expected Use Cases: data mesh, data domains

keithgchapman commented 2 years ago

@tmnd1991 Thanks for raising this issue. At this point we don't have any plans to built multi-tenancy in Nessie. We would welcome your ideas around this though and would be more than happy to accept code contributions in this regard.

Having said that Dremio's Arctic service has the ability to support multiple catalogs within a single organization. It is powered by a managed version of Nessie.

tmnd1991 commented 2 years ago

Interesting, so the feature is there in the managed version, am I right? That's good to hear.

I will try to think about a model for multi-tenancy and maybe provide a PR in that regard :)

dimas-b commented 2 years ago

@tmnd1991 : It may manifest differently at the API level. I'm interested in how you envision that.

tmnd1991 commented 1 year ago

I finally might have time to work on this. I would like to start from the Curren openapi spec of nessie. Should I build the project in order to obtain it or is there an easier way? (like a link?)

tmnd1991 commented 1 year ago

Found the link in the release page, sorry for the spam.