monetizeio / sqlalchemy-orm-tree

An implementation for SQLAlchemy-based applications of the nested-sets/modified-pre-order-tree-traversal technique for storing hierarchical data in a relational database.
Other
53 stars 10 forks source link

Use farey fractions instead of integer intervals #4

Open maaku opened 11 years ago

maaku commented 11 years ago

Using Farey fractions instead of integer intervals should result in a signficiant performance advantage, as updates would be limited to touching its own part of the tree. For more information, see:

http://arxiv.org/html/cs.DB/0401014

maaku commented 10 years ago

BTW, @tony, I don't know if you've seen this issue. SQLAlchemy-ORM-Tree is the most scalable and performant general framework I know of for storing arbitrary trees in SQL from Python. But it could be much better.

In particular, insertions, moves, and deletions to a specific tree require on average O(0.5n) update operations when using nested sets of integer intervals, where n is the size of the tree. This is obviously non-optimal.

If my understanding of the linked article(s) is correct, by switching to Farey fractions instead of integer intervals we can restrict the performance hit to just the size of the sub-tree under the node being updated. That means O(1) operations when you only touch leaf nodes!

Unfortunately I only found out about this research after we stopped using sqlalchemy-orm-tree in production. But anyone making an investment in this package should consider replacing the current implementation with Farey fractions. It should be a rather surgical change to sqlalchemy_tree.orm plus a few of the query methods, with the unit tests providing assurance that nothing's broken in the process.

tony commented 10 years ago

@maaku:

  1. Do you need a maintainer? I am happy to assist with this package at the python level (2/3 compatibility, sqlalchemy compatibility since my projects need this at the particular time).
  2. Frankly, I'm still wrapping my brain around how the current codebase works at the theory behind nested sets, adjacency lists, and materialized paths at a deep level.

It would be pretty god tier if you could help make a branch for this.

tony commented 10 years ago

SQLAlchemy-ORM-Tree is the most scalable and performant general framework I know of for storing arbitrary trees in SQL from Python.

Yes, so far here is a frameworks I've been looking for (sqlalchemy-tree-traversal and sqlamp): https://github.com/tony/.dot-config/blob/7f82cbd4255ed18daf8208a2b3557a217848ff3b/.vcspull.yaml#L8 (not including django-mptt). This particular module incorporates sqlamp also, which is awesome. So there is a need for the community to have this maintained and have good documentation for it.

FYI: I had to dig around a bit to find this, I am happy to get us on Readthedocs with #7 and help write docs as I study the code more intimately. How's that sound?

tony commented 10 years ago

It should be a rather surgical change to sqlalchemy_tree.orm plus a few of the query methods, with the unit tests providing assurance that nothing's broken in the process.

The more I think about this, the math involved in Farey Fractions escapes me and adds sophistication to something that already is overwhelming (at least to my brain). If we do implement this, it may be helpful to preserve the current implementation as well.

@maaku : Think you'd be able to trying to in a branch? I can spend the time preserving an aesthetic way to retain the both implementations in a good API / naming conventions.

maaku commented 10 years ago

Well it is an almost purely internal change. You would replace left/right with numerator/denominator of the Farey fraction, and change some of the most basic ORM and query methods. Everything else builds on that with the same API.

It's been on my TODO list for over a year however, so I can't guarantee getting around to it (although now that others are using sqlalchemy-orm-tree, I'm feeling more motivated).