mirage / irmin

Irmin is a distributed database that follows the same design principles as Git
https://irmin.org
ISC License
1.83k stars 154 forks source link

irmin-(client|server): remove server-side state, simplify batch api, and cleanups #2272

Closed metanivek closed 11 months ago

metanivek commented 1 year ago

This is a follow up PR from the initial irmin-server integration PR.

The big changes:

  1. Remove most server-side state. Previously, there was an API to manipulate trees on the server, and also "current branch/store" tracking per client. The server-side tree manipulation APIs have been removed and branch/store state is now kept on the client and passed when needed. We can add these sorts of client/server interactions in the future, but I think it is generally better to keep as little state on the server as possible.
  2. Simplify batch API. The batching API is now completely "build on the client and then send to the server for application". With the exception of (sometimes) adding local trees to a batch, creating batches will not need to call the server. The API is smaller then previously but we can grow it as needed based on use cases.

There are several other cleanups: adopting irmin's [%log.* ...] ppx since this is now an official irmin sub-project, adding a full example that demonstrates batching, and adding a mutex for requests so that the server is only processing one client request at a time.

It's also worth noting that I discovered a bug in irmin-git/ocaml-git while testing the batch apply API. I opened PR for ocaml-git that fixes the core issue, but until that is released and then irmin-git is patched, we cannot use the batch api's add_value with irmin-git since it results in a decoding issue on the server.

samoht commented 1 year ago

Thanks. Simplifying this API to store less state on the server is a good idea.

There is still something off in that API that we could improve. For instance, we have:

If you blink hard enough, both types should be Merkle proofs (i.e. a tree with some trees hidden behind a hash). I wonder if we could instead expose a client-side API to manipulate Merkle proofs and base all client/server messages on this instead of custom, non-optimised approximations.

I don't have more concrete suggestions apart that I believe you are going in the right direction :-) Also, another thing to consider for the client-side API is how it will look when using other languages (e.g. Python or C). So, we need to check what bindings would look like.

metanivek commented 1 year ago

If you blink hard enough, both types should be Merkle proofs (i.e. a tree with some trees hidden behind a hash). I wonder if we could instead expose a client-side API to manipulate Merkle proofs and base all client/server messages on this instead of custom, non-optimised approximations.

The tree part of the batch api definitely could use more exploration (this PR only removes server-side id trees but keeps the rest mostly untouched). If you are adding a pre-existing tree, I think that the api works decently well (just using the tree's key), but if the tree doesn't have a key it falls back to concrete trees, which doesn't seem that great. I'll think about how to unify this using proofs but am open to more thoughts about how it might work if you have them to share!

I think the contents part of the batch works okay. It is either a new value or a hash for an existing value (although maybe it should be contents_key based instead).

Also, another thing to consider for the client-side API is how it will look when using other languages (e.g. Python or C). So, we need to check what bindings would look like.

Good point. I've exercised it lightly for OCaml through the example, but I agree looking at foreign bindings is a good exercise. I guess a first step would be adding irmin-client bindings to libirmin.

samoht commented 1 year ago

I think the initial goal we had with Zach was:

  1. the OCaml client bindings to be just an ordinary Irmin.S implementation, and so could just pass irmin-test
  2. tree operations have to be done on the client side.
  3. the client/server interface should be higher than the low-level store to allow some sort of batching

I think 1. and 2. are working for irmin-client, but using the low-level store API calls (so defeating 3.). The batch API was a middle-ground solution to fix 3. but I am not super happy with it either.

Maybe the right way to address this (not in this PR, which is already a reasonable simplification of the batch API) is to look at all the API calls that involve trees and see what kind of client/server interactions we want from those. I suspect we could re-implement a different tree.ml for clients, where reads will be on-demand (like they are today) but where we have a way to batch writes. Not totally sure how this would work without looking more in-depth tbh 😅

metanivek commented 11 months ago

@art-w thanks for the review! I've updated both the things you called out.