owncloud / ocis

:atom_symbol: ownCloud Infinite Scale Stack
https://doc.owncloud.com/ocis/next/
Apache License 2.0
1.4k stars 183 forks source link

Add metadata cache and propagation strategy on s3 #24

Closed butonic closed 3 years ago

butonic commented 5 years ago

To quickly answer which files changed we need to have an mtime and etag for directories. for s3 we cannot store matadata for keys that represent directories, because they get lost when adding a key to the prefix ... at least with minio that is the case. for local storage that supports extended attributes we can store the etag as an extended attribute. for local and s3 we need to do directory size accounting.

to enable stateless sync mtime, etag and size need to be propagated up the tree. The data needs to be stored in the storage for persistence. A cache on top can then be used to improve query speed.

This is related to being able to set arbitrary properties: https://github.com/owncloud/nexus/issues/28 not all s3 implementations allow metadata (minio does not)

So a storage needs a metadata persistence strategy / implementation? Hm, what is the cs3 api for this? AFAICT it is implicit. When executing PROPFINDS sync with the desktop clients will work if the etag changes ...

What about a propagation strategy? sync? async?

Tagging is modeled as a different service in cs3: https://github.com/cernbox/cs3apis/blob/master/cs3/tag/v0alpha/tag.proto AFAICT it needs an update to use CS3 References instead of filename strings.

As a cache a k/v store like https://github.com/dgraph-io/badger makes sense. Can we split the actual storage metadata from blob storage? That is kind of what is necessary for s3 if we were to use it exclusively, anyway. For now implement in local and s3, then extract common pieces?

butonic commented 5 years ago
// what is cached
// for localfs the acls / sharing permissions:
// - what did I share with whom
// - who shared what with me
// -> but this is for the share provider

// how often do we update the cache?

// what is the key?
// - the file id?
// - the path?

// do we need a fast fileid to path lookup?
// - for s3 only if we store the blobs by the fileid
// - for s3 how do we implement a tree in a kv store?
// - badger supports key iteration with prefix https://github.com/dgraph-io/badger#prefix-scans

// how can we make reva update metadata for a certain path?
// eos handles metadata itself, maybe ... what if we want to force an update?
// local/posix can use fsnotify
// s3 implementations vary:
// - minio has https://docs.min.io/docs/minio-bucket-notification-guide.html
// - aws has https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
// - ceph has http://docs.ceph.com/docs/master/radosgw/s3-notification-compatibility/

// in any case how does this affect the cache?
// - do we get all metadata to properly update the entry?
// - is it only an event that alows us to update the cache?
// -> AFAICT this is implementation specific:
//   - local only needs fsnotify to propagate the etag.
//     the fs dir entries can hold etag itself
//     (in contrast to s3 where we would have to introduce a dedicated namespace)
//     - etag as ext attr? or only for files? for folders in cache to prevent hot spot on disk?
//     - dirsum as ext attr? or only in cache?
//     - mtime for folders in cache?
//     - booting requires rebuilding cache? add a reva command for it?
//     - shares in cache? is a different service?
//     - tags as extended attributes?
//       - user defined tags vs system tags? system tags in kv store? but is a different service anyway
//     - comments? extended attributes too small
//       -> separate app that stores comments for a fileid
//       - everything is a file, store comments on filesystem so it can be eg geo distributed by eos or cephfs
//
//   - s3 is a different beast
//     - needs cache to list folders efficiently
butonic commented 5 years ago

should we add the cache to the storageprovidersvc or would that limit the integration possibilities with the actual storage implementation too much. or would it make sense to configure the kv store as a standalone service and give storages access with via api, so that the actual kv store used can be changed. eg from an embedded kv to eg redis or quarkdb?

for now: a kv cache api can be added after we implement the cache for s3. It will tell us what calls we need in the first iteration

DeepDiver1975 commented 5 years ago

kv store will be persistant or act as cache only for faster access? just asking for clearer understanding

butonic commented 5 years ago

@DeepDiver1975 short answer: it depends.

Long answer: this is storage implementation dependant. The current s3 implementation for reva assumes the data in s3 adheres to a folder structure. I am planning to implement a persistent kv based cache for the metadata to get rid of constant metadata lookups. The current s3 implementation uses no cache. It has to invent mtimes and etags for folders and defaults to the 0 timestamp. This prevents the desktop client from constantly syncing the whole tree, while at the same time allows using the web interface to navigate the s3 storage and up and download files. This is the basic storage functionality.

The next level requires adding a cache to store the metadata and a way to update the cache. It is a real cache and can always be rebuild from the s3 metadata. If the s3 product supports notifications we can update the cache an sync starts working. But that already is s3 product specific. A fallback might be a periodic scan if the admin configures it and can afford the traffic (or does not have to pay for it)

The next level would be an exclusive s3 storage where we only store the blobs in s3. Then the kv store would be the only place containing metadata. That would be the fastest solution. But now metadata and blob storage are kept separate. An option would be to store metadata as objects in s3. That might be necessary for some s3 products to implement all capabilities. minio does eg. not support tagging.

some more notes from my current dev branch:

    // first try the cache?
    // what to put there?
    // - metadata we need for propfind
    // - all we can reconstruct from the s3
    // when do we refresh?
    // - if the browser is used to get the files
    // - when the desktop polls we only use the cache
    // - when the browser checks, we go to the storage, and update the cache
    // - this needs the user agent from the original http request to be copied to the grpc request.
    // how can we manually update?
    // - a cli tool can stat a key / path in s3
    //  - if the etag is different than our cache we can propagate the change
    // - periodically scan all files?
    // should we respect cache-control headers?
    // - no ... how do we prevent requests from spamming the s3 api if someone scripts the requests and
    //   tries to ddos the service. -> rate limiting?
    // what about cache invalidition?
    // - 0 = unlimited, the default: we don't want automatic invalidation. it might cost money
    // - a day / week / month? configurable
    // - manual invalidation, so either admins or users can request a scan.
    //   - hm, that would lead to full scans, because we cannot mark a subtree as dirty ...
    // - it is rather how often do we want to update the metadata
    //   - a ttl, or
    //   - a manual update with a prefix that scans all keys with the prefix
    //     - this would allow subtrees to be updated.
butonic commented 5 years ago

Some thoughts after initial implementation work:

butonic commented 5 years ago

minio recomments the AssumeRole API (or the relevant aws docs and the AWS access control overview) instead of object acls: https://github.com/minio/minio/issues/4496#issuecomment-417874753 they seem to be a legacy way to specify permissions even on aws

refs commented 3 years ago

@butonic can this be closed? or should be moved elsewhere? Is it still relevant?

dragotin commented 3 years ago

I think it can be closed as it is implemented by @butonic and @aduffeck. Please reopen if I am wrong.