storj / roadmap

Storj Public Roadmap
Other
9 stars 3 forks source link

Lexicographically ordered listing #62

Closed kaloyan-raev closed 1 year ago

kaloyan-raev commented 1 year ago

Background

What is the problem/pain point?

When listing objects, the object keys in the result are not sorted in lexicographic order.

This is due to the end-to-end encryption that is used not only for object content but also for object keys. We don’t use an ordering-preserving encryption scheme yet, meaning that it’s impossible always to list a bucket in lexicographical order (as per S3 specification). For requests that come with forward-slash-terminated prefix and/or forward-slash delimiter, we list in the fastest way we can, which will list a bucket in lexicographical order, but for encrypted paths (which is often very different from the expected order for decrypted paths). Ideally, clients shouldn’t care about ordering in those cases. For requests that come with non-forward-slash-terminated prefix and/or non-forward-slash delimiter, we perform exhaustive listing, which will filter paths gateway-side. In this case, gateways return listing in lexicographical order. Forcing exhaustive listing for any request is not possible for Storj production deployments of Gateway-MT, and for, e.g. Gateway-ST can be achieved with the --s3.fully-compatible-listing flag.

Who is impacted?

What is the impact?

Some S3 apps malfunction because they expect object listings to be sorted in lexicographic order.

Requirements

User Story

As a Storj user, I want to have the object listing sorted in lexicographic order so I can integrate existing applications with Storj that expect such functionality.

Acceptance Criteria

Success Metrics

Open Discussion/Questions

Possible Design

Currently, we have a workaround on how to create access grants and S3 credentials that do not encrypt object keys: https://github.com/storj/storj/issues/5564#issuecomment-1424569200, but it involves modifying libuplink and recompiling Uplink CLI, which is a task that requires significant technical skills.

We should expose the required API in libuplink and use it in Uplink CLI to enable users to create such access grants and S3 credentials without doing any code modifications themselves. This is Milestone 1.

Milestone(s)

ferristocrat commented 1 year ago

With the completion of Milestone #1 users are able to get lexicographical ordering for use with S3 tooling. See the docs for more details: https://docs.storj.io/dcs/lexicographically-sorted-object-listings

amwolff commented 1 year ago

I'd like to document some performance-related caveats around this feature that would be discovered later on anyways.

Even though using an access grant that doesn't encrypt paths enables Uplink CLI and S3 compatibility layer to always output lexicographically-ordered results, it will only optimally work in these cases:

1) no prefix, no delimiter 2) prefix terminated with a forward slash, no delimiter 3) no prefix, the delimiter is a forward slash 4) prefix terminated with a forward slash, the delimiter is a forward slash

Because the separator for paths in metadata storage is forward slash and supplying a different combination than the above is unsupported, both Uplink CLI and S3 will list the bucket exhaustively when the prefix is something else. S3 will also do that in case of delimiter; CLI doesn't support setting the delimiter to something else.

So even though completing this item will allow us to go faster in cases 1-4 when lexicographically-ordered results are needed, the architectural design of how paths are stored will equalize how fast we can go in other cases.

iglesiasbrandon commented 1 year ago

closing this item because we have basic support now