versity / versitygw

versity s3 gateway
https://www.versity.com/products/versitygw/
Apache License 2.0
151 stars 19 forks source link

Object Versioning for posix/scoutfs #678

Open benmcclelland opened 1 month ago

benmcclelland commented 1 month ago

Describe the solution you'd like We would like to optionally support object versioning compatible with AWS S3. The following requirements/behaviors are expected:

Objectives Versioning behavior compatible to AWS S3 when enabled. AWS documentation can be found here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html

Design To enable, a directory should be configured for where to store the non-current object versions. The older object versions should not be stored within the gateway root namespace to prevent confusion when accessing the namespace outside of S3. When deleting or uploading an existing object, the older version can be moved to the version directory. If the version directory is within the same filesystem, then the move will likely happen fast not needing to re-write all the file data. If it is not within the same filesystem, then the move will have to copy all file data to the new location. This is handled automatically in file renaming.

Version Namespace The directory structure for the older object versions does not need to be a compatible namespace with posix filenames like the primary namespace does. The easiest namespace for these would be based on a sha256 hash of the object name, and creating a small directory structure with that name. The top level directory will still need to be the bucket to prevent collisions across buckets. To be nicer to posix filesystems and not have all objects in the same directory, we can split the object name hash into directories based on the first few bytes of the hash. This is a common tactic in other projects. For example,

bucket: mybucket object: dir1/dir2/myobject
sha256("dir1/dir2/myobject") = cefc8816ed641f7323d2f51e534a48c623364803fa1e7b3227c892eb80b4b100

location of version "1":

<version directory>/mybucket/ce/fc/88/cefc8816ed641f7323d2f51e534a48c623364803fa1e7b3227c892eb80b4b100/1

Version IDs Each object version in the version namespace has an ID associated with it in AWS that uniquely identifies that object version. We can explore a few options here:

Delete Markers When an object is deleted, the current object gets moved to versioning and a new empty object gets placed in the primary namespace with a delete marker attribute indicating that this object shouldn't be listed or retrieved (as it was deleted). But older versions can still be restored to replace the delete marker object. We will likely just add a new xattr to signify that the file is a delete marker, and handle this accordingly in the listing walks.

list-object-versions We need to enable listing of the object versions as well as objects when list-object-versions called. This can be handled in the listing walk function to look into the version namespace for each object visited. The walk function results may need to be modified for handling versioning.

RFC This is intended to be RFC style open to comments. Any requirement changes or design change proposals can be discussed in issue comments.

benmcclelland commented 1 month ago

We probably need to consider the best naming conventions for the versions. In the above example I listed the version file name as "1", but maybe a timestamp, uuid, or something else should also be considered. I assume it is expected that list versions would list the versions in mtime order? So we may want something sortable in the way the API response expects.

benmcclelland commented 1 month ago

I think it only makes sense to version file-objects, not directory-objects since versioning directory contents wouldnt really be possible.

jonaustin09 commented 1 month ago

We probably need to prevent the versioning directory from being the same as or in the root directory of the posix/scoutfs backend ? As there will be bucket/object collision.

jonaustin09 commented 1 month ago

As object versions are ordered by the modification date in ListObjectVersions, a reasonable solution would be to choose version names as the Unix nanoseconds of the last modification date.

For example:

bucket: mybucket object: dir1/dir2/myobject
sha256("dir1/dir2/myobject") = cefc8816ed641f7323d2f51e534a48c623364803fa1e7b3227c892eb80b4b100

// Object versions location
<version directory>/mybucket/ce/fc/88/cefc8816ed641f7323d2f51e534a48c623364803fa1e7b3227c892eb80b4b100/<version_1creation_nano_seconds>
directory>/mybucket/ce/fc/88/cefc8816ed641f7323d2f51e534a48c623364803fa1e7b3227c892eb80b4b100/<version_2creation_nano_seconds>
...

However, the downside of this solution is that the GET by VersionID operation becomes expensive because the exact location of the VersionID can only be determined after listing all the versions(in worst scenario).

jonaustin09 commented 1 month ago

The VersionID could simply be a UUID, as it doesn't affect the sorting of object versions. The only requirement is that it must be unique.

The important thing to note is that the VersionID uniquely identifies a version of an object within a bucket. VersionIDs may vary in length but generally look like the following:

3/L4kqtJlcpXroDTDmJ+rmspuAmTeQhF3
benmcclelland commented 1 month ago

I think there are time sortable UUIDs: https://pkg.go.dev/github.com/google/uuid#UUID.Time https://github.com/segmentio/ksuid

jonaustin09 commented 1 month ago

Another solution which would best match to our use case is using lexicographically sortable, timestamp dependent uuids: https://github.com/oklog/ulid