transparency-dev / trillian-tessera

Go library for building tile-based transparency logs (tlogs)
Apache License 2.0
11 stars 11 forks source link

Make `Fetcher` client function more flexible #278

Open cmurphy opened 1 month ago

cmurphy commented 1 month ago

In integrating Tessera into the Rekor log personality, we would like to have the Rekor server be able to return an inclusion proof to a client. In Tessera's model, this effectively makes Rekor its own client, since it needs to query its internal storage to fetch the tiles needed for the proof.

Tessera's client.NewProofBuilder constructor requires a function of type Fetcher as input, which takes a path string as a parameter and returns a tile. This makes sense when the tile is stored in a posix-like file storage organized by directory paths. It makes less sense when using MySQL as the storage backend, and is awkward to use when the server has direct access to the MySQL storage provider instance.

Consider the tile fetcher constructor:

https://github.com/transparency-dev/trillian-tessera/blob/0c3daf45ad69eecc6e9cd548897df65a6b08a7a5/client/client.go#L272-L286

It takes the log size, tile level, and entry index to construct the tile path and then calls the fetcher function on that path to retrieve the tile from storage.

When using the MySQL storage instance directly, the path has no meaning. So to construct a proof builder, we have to deconstruct the path, reversing the work of the newTileFetcher:

tileOnlyFetcher := func(ctx context.Context, path string) ([]byte, error) {
        pathParts := strings.SplitN(path, "/", 3)
        level, index, width, err := layout.ParseTileLevelIndexWidth(pathParts[1], pathParts[2])
        if err != nil {
                return nil, err
        }
        return tesseraStorage.ReadTile(ctx, level, index, width)
}
proofBuilder, err := client.NewProofBuilder(ctx, checkpoint, tileOnlyFetcher)

This works, it's just a bit strange and roundabout. It would be nice to be able to construct a proof builder that can work natively with any of the storage providers, or at least doesn't assume that the tile is indexed by its path.

AlCutter commented 4 weeks ago

Hi @cmurphy,

Thanks for taking the time to write this up!

I think what we'll do near term is have each storage impl also provide an implementation of client.Fetcher for use-cases like this, where the application layer needs to read log data directly. It can do that in whatever is the most idiomatic fashion for the infra.

I understand your point about using the string containing the resource path being used here, and another way to think about it is that this string is essentially the Tessera public read API. The MySQL storage is unusual/unique in that it is the one storage infra which doesn't directly expose a filesystem-like view of that API based on resource path.

What you're suggesting is adding a second semi-public/private read API for application developers because one of the storage types is a bit unusual - all others absolutely do use that resource path as their primary key equivalent. In that framing, it does feel reasonable to me that the burden of complying with that API should fall on the MySQL storage impl.

Of course, we could think about having the client package define more tightly scoped funcs which need to be passed in to the various methods/c'tors there (e.g. going from just client.Fetcher to having client.CheckpointFetcherFn, client.TileFetcherFn, client.EntryBundleFetcherFn, etc.) - that would enable those individual storage fetcher implementations to be defined in terms of a different read API to the public one, but I wonder whether that is worth an increase in "apparent complexity" for client package users in general (I don't really know at the moment).

I did hack up a quick experiment of what such a change might look like here: https://github.com/transparency-dev/trillian-tessera/compare/main...AlCutter:trillian-tessera:client_more_specific_fetch_interfaces_playground?expand=1

Separately, though, I think there would be value in taking a step back to think about what an "ideal" client package API might look like, both from Tessera log application developers and developers of clients of those log applications.

AlCutter commented 2 weeks ago

Hi again,

Thanks for your patience, I'm back after dealing with summit stuff and taking some vacation.

I've been playing around with this quite a bit recently, trying a few things out to see how they feel - I'm warming up to your idea and I've merged #281, and that plus #292 adds add a more fine-grained set of read primitives to the storage contract and updates the client code so that it can directly use them.

How do those changes look from your PoV?

cmurphy commented 1 day ago

Thanks for this! Using the new ReadTile method instead of implementing my own Fetcher feels much more natural.

I noticed a curiosity, which is that with ReadEntryBundle becoming a shared interface method, an additional parameter for the tree size was added to the mysql implementation that I'm sure is needed for the POSIX and GCP providers, but isn't used at all for mysql. It's just another thing that makes the mysql backend seem like the odd stepchild. I don't have a good suggestion for how to change the interface, but a doc comment explaining that the treeSize parameter isn't used for that provider would be helpful.