Open maaku opened 11 years ago
BTW, @tony, I don't know if you've seen this issue. SQLAlchemy-ORM-Tree is the most scalable and performant general framework I know of for storing arbitrary trees in SQL from Python. But it could be much better.
In particular, insertions, moves, and deletions to a specific tree require on average O(0.5n) update operations when using nested sets of integer intervals, where n
is the size of the tree. This is obviously non-optimal.
If my understanding of the linked article(s) is correct, by switching to Farey fractions instead of integer intervals we can restrict the performance hit to just the size of the sub-tree under the node being updated. That means O(1) operations when you only touch leaf nodes!
Unfortunately I only found out about this research after we stopped using sqlalchemy-orm-tree in production. But anyone making an investment in this package should consider replacing the current implementation with Farey fractions. It should be a rather surgical change to sqlalchemy_tree.orm plus a few of the query methods, with the unit tests providing assurance that nothing's broken in the process.
@maaku:
It would be pretty god tier if you could help make a branch for this.
SQLAlchemy-ORM-Tree is the most scalable and performant general framework I know of for storing arbitrary trees in SQL from Python.
Yes, so far here is a frameworks I've been looking for (sqlalchemy-tree-traversal and sqlamp): https://github.com/tony/.dot-config/blob/7f82cbd4255ed18daf8208a2b3557a217848ff3b/.vcspull.yaml#L8 (not including django-mptt). This particular module incorporates sqlamp also, which is awesome. So there is a need for the community to have this maintained and have good documentation for it.
FYI: I had to dig around a bit to find this, I am happy to get us on Readthedocs with #7 and help write docs as I study the code more intimately. How's that sound?
It should be a rather surgical change to sqlalchemy_tree.orm plus a few of the query methods, with the unit tests providing assurance that nothing's broken in the process.
The more I think about this, the math involved in Farey Fractions escapes me and adds sophistication to something that already is overwhelming (at least to my brain). If we do implement this, it may be helpful to preserve the current implementation as well.
@maaku : Think you'd be able to trying to in a branch? I can spend the time preserving an aesthetic way to retain the both implementations in a good API / naming conventions.
Well it is an almost purely internal change. You would replace left/right with numerator/denominator of the Farey fraction, and change some of the most basic ORM and query methods. Everything else builds on that with the same API.
It's been on my TODO list for over a year however, so I can't guarantee getting around to it (although now that others are using sqlalchemy-orm-tree, I'm feeling more motivated).
Using Farey fractions instead of integer intervals should result in a signficiant performance advantage, as updates would be limited to touching its own part of the tree. For more information, see:
http://arxiv.org/html/cs.DB/0401014