zopefoundation / ZODB

Python object-oriented database
https://zodb-docs.readthedocs.io/
Other
675 stars 91 forks source link

Multiple indexed views for object collection? #44

Open drmalex07 opened 8 years ago

drmalex07 commented 8 years ago

Hello ZODB team

My use case is rather simple: I have some BTree collections of objects keyed by some unique integer key (i.e IOBTree collections of Persistent objects). My application is not really search-centric, but in these very few cases (few, but frequently happening) that a lookup has to be carried-out by another (non-key) field, i have to iterate the entire container sequentially.

Is it possible to maintain some secondary index on a certain BTree collection? I do not mean directly (i.e on the same tree) but maybe maintain an auxiliary value->{set-of-keys} mapping?

I have implemented some Collection class on top of BTree, something like:

tree = root['users'] = IOBTree()
ix1 = root['users.by_name'] = OOBTree()
users = Collection(tree, name=ix1) # declare primary tree and indices

# The collection should follow underlying BTree api as much as possible
u1 = User(id=12, name='Foo')
users.insert(u1) # also inserts to name-based index
...
# The collection should return proxy objects on search operations 
p1 = users.get(12) # returns a proxy (to u1) that intercepts __setattr__, __delattr__
p1.name #  delegate to u1.name
p1.name = 'Baz' # delegate to u1, but also update name-based index

Of course, my implementation has several limitations and is only a workaround for my use case. Maybe some more mature solutions already exist ? or maybe this is a reason to migrate to a relational model ?

mgedmin commented 8 years ago

"Catalog" is the usual term in ZODB-land (because a catalog contains one or more indexes). These aren't provided by ZODB itself, but there are packages that provide them on top of ZODB. E.g. one of those is repoze.catalog.

I haven't used catalogs much personally, so I cannot give recommendations.

jimfulton commented 8 years ago

Good answer. Thanks Marius.

To elaborate a bit, catalogs are objects that maintain multiple indexes on a collection of objects. They provide for querying on one or more among multiple indexes. When querying multiple indexes. They rely on some low-level machinery provided by BTrees for doing set operations on results from searching multiple indexes.

Jim

On Thu, Feb 11, 2016 at 8:13 AM, Marius Gedminas notifications@github.com wrote:

"Catalog" is the usual term in ZODB-land (because a catalog contains one or more indexes). These aren't provided by ZODB itself, but there are packages that provide them on top of ZODB. E.g. one of those is repoze.catalog http://docs.repoze.org/catalog/overview.html.

I haven't used catalogs much personally, so I cannot give recommendations.

— Reply to this email directly or view it on GitHub https://github.com/zopefoundation/ZODB/issues/44#issuecomment-182858077.

Jim Fulton http://jimfulton.info

drmalex07 commented 8 years ago

@jimfulton , @mgedmin Thank you for your quick answer.

I didn't know about repoze.catalog (in fact i thought that repoze had only to do with authn/authz middleware).

I played a bit with it, and it seems that Catalog objects can be persisted into a ZODB database. It also seems that the catalog doesn't own the indexed objects (and so cannot retrieve by docid), so an external mapping (i.e a DocumentMap) must be maintained side-by-side and also be persisted into the database.

Am i correct? Is the following example valid?

class User(Persistent):
     pass # Implement __cmp__, carry a `name` attribute

# Populate our containers

user_catalog = Catalog()
user_catalog['name'] = CatalogFieldIndex('name')
user_map = DocumentMap()

u1 = User('Totos')
u2 = User('Foo')

user_catalog.index_doc(1, u1)
user_map.add(u1, 1)
user_catalog.index_doc(2, u2)
user_map.add(u2, 2)

# Commit to database

db = DB(FileStorage('users.zodb'))
conn = db.open()
root = conn.root()

root['user_map'] = user_map
root['user_catalog'] = user_catalog
transaction.commit()

conn.close

My main concern lies at this warning: http://docs.repoze.org/catalog/usage.html#restrictions. But maybe this is resolved by subclassing objects directly from object ??

djay commented 8 years ago

You might want to look at souper. Its a single data structure that keeps internal indexes so you can search it. https://pypi.python.org/pypi/souper

On Fri, 12 Feb 2016 6:41 pm MichailAlexakis notifications@github.com wrote:

@jimfulton https://github.com/jimfulton , @mgedmin https://github.com/mgedmin Thank you for your quick answer.

I didn't know about repoze.catalog (in fact i thought that repoze had only to do with authn/authz middleware).

I played a bit with it, and it seems that Catalog objects can be persisted into a ZODB database. It also seems that the catalog doesn't own the indexed objects (and so cannot retrieve by docid), so an external mapping (i.e a DocumentMap) must be maintained side-by-side and also be persisted into the database.

Am i correct? Is the following example valid?

class User(Persistent): pass # Implement cmp, carry a name attribute

Populate our containers

user_catalog = Catalog() user_catalog['name'] = CatalogFieldIndex('name') user_map = DocumentMap()

u1 = User('Totos') u2 = User('Foo')

user_catalog.index_doc(1, u1) user_map.add(u1, 1) user_catalog.index_doc(2, u2) user_map.add(u2, 2)

Commit to database

db = DB(FileStorage('users.zodb')) conn = db.open() root = conn.root()

root['user_map'] = user_map root['user_catalog'] = user_catalog transaction.commit()

conn.close

My main concern lies at this warning: http://docs.repoze.org/catalog/usage.html#restrictions. But maybe this is resolved by subclassing objects directly from object ??

— Reply to this email directly or view it on GitHub https://github.com/zopefoundation/ZODB/issues/44#issuecomment-183290165.