i2mint / py2store

Tools to create simple and consistent interfaces to complicated and varied data sources.
MIT License
11 stars 2 forks source link

Metadata support #58

Open thorwhalen opened 4 years ago

thorwhalen commented 4 years ago

How do we add metadata support to various py2store constructs -- only ones I have in mind now are stores and values.

Sometimes it's useful (or necessary) to distinguish two types of metadata:

For files we have os.stat(path) that gives us an object where we can see size and creation/modification/access timestamps and other useful goodies.

For user settable metadata, a.k.a. "extended attributes", the standard lib has only Linux support (through os.*xattr (* as get, list, remove, and set)), but there's support of for other systems outside of standard lib, such as xattr. See also technosophos article, and biplist.

That's for file systems. But we'd like to support metadata with a consistent interface--whenever possible and where ever it makes sense.

Some systems provide explicit metadata access. Some (e.g. S3, dropbox, google drive, github) even seem to give access to content through metadata objects -- that is, you need to first list or get the metadata information, and from it get the "keys" you need to ask for the content.

But in other systems (e.g. MongoDB as far as I know) metadata is either not easily accessible, or doesn't come out of the box as tightly coupled with content.

Ideally, we will give access to "system metadata" when available, but also provide a way to specify user metadata. One pattern we can use for this is similar to caching patterns: We intercept specific store operations so as to maintain a ledger of metadata information in a separate (or not) store.

In this way, user metadata of some DB could be maintained in local files, or user metadata of local files could be maintained in a DB, seamlessly and flexibly.

In fact, as we move up towards more abstraction the metadata case will merge with all uses cases where we modify enhance store operations with various side effects that create linked information.

Caching, logging, metadata maintenance, flat vs nested implementations. They all share common patterns.

thorwhalen commented 1 year ago

The issue of metadata comes up again quite often (or more generally key-linked mappings (i.e. two or more mappings that are linked in some way.).

Recently, I solved the problem by allowing values to have attributes. In StringWhereYouCanAddAttrs I extended str so that I could add the meta-datas I wanted to directly on the values (which were strings).

Doesn't seem like a good way to solve the problem.

Instead -- and this would only work for the actual "metadata" case, not the general "key-linked mappings" case -- I would imagine that a better direction would be to add a .meta MutableMapping attribute to a store, with some mechanism to keep it linked to the store (perhaps through a descriptor?). In order to keep things sane, we'd have to make sure that any key transformations that are layered on the store end up also being applied to the keys of .meta. Optionally, we could make .meta only be able to have keys that are also keys of the store, but possibly with a default mechanism so that metadata needs to be specified explicitly (values of .meta would default to an empty dict for example).