Migrates tree_id from an `Integer` to `Unicode(32)`

markroper commented 8 years ago

The tree_id column is currently an Integer thats managed in application code. New trees are created by querying for the max value in the table and incrementing it by one, which isn't thread safe. An an example, a multithreaded application may try to create two different new trees and assign them both the same tree_id. I've made the tree_id a Unicode(32) so that tree IDs can be generated as GUIDs in application code without the risk of ID collisions.

I like the lib a lot!

coveralls commented 8 years ago

Coverage increased (+0.04%) to 99.559% when pulling 953560b31d73381d608ee18f9dd5ecdd99ef1f89 on markroper:master into e351c9c4645367edfc08144d95c4553e5d468807 on uralbash:master.

uralbash commented 8 years ago

@markroper Thank you for your interest in the library. When I studied this theme for writing libraries I have not found a mathematical basis for mptt together with RDBMS. So I did it as I understand and focusing on the structure of the models in django-mptt. dj-mptt uses integer tree_id field, sa-mptt too, it is necessary not only to separate tree but also for sorting and order trees. Therefore, when the manipulation of top nodes it change tree_id too, see:

If we choose the UUID4 it will take up more space (32-4=28bytes per row) in the database and not allow trees to keep order. I think that is wrong. I think it is possible figure out how to make this field is re-defined for more flixibility of this lib.

On the other hand, but is it necessary? DB request in the different thread sends in the separate transaction and ACID must ensure the integrity of data. Could you give an example of how to reproduce this problem.

markroper commented 8 years ago

@uralbash Thanks for the reply! In the use case I have, I do not care about the ordering of the different trees and so maintaining an integer tree id is not a requirement. I understand that you've supported it here as part of the port. Your point about the additional space required for a UUID is a good one, but for my use case, the additional size is not a concern.

The issue I see is on https://github.com/uralbash/sqlalchemy_mptt/blob/master/sqlalchemy_mptt/events.py#L77 Even though the RDBMS has ACID transactions, two threads concurrently querying for [func.max(table.c.tree_id) + 1] to define a new tree ID will resolve the same value and assign that value to the row that each thread goes to insert. One way to prevent this occurrence would be to use a UUID or GUID (at the expense of ordering) as I've shown here. Another would be to create a separate table to maintain a thread safe auto-incrementing value. Something like: https://www.percona.com/blog/2010/10/04/sharing-an-auto_increment-value-across-multiple-mysql-tables/ and to use an insert into this table to resolve new tree_ids.

uralbash commented 8 years ago

I would like to save the sort feature because I use it in pyramid_pages module. But the problem with multi threading is also necessary to solve. Do you think if I add table lock on time whether when insert/update top node(tree) it will solve the problem? I mean SELECT FOR UPDATE query when select max tree_id http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=lock#sqlalchemy.orm.query.Query.with_for_update

markroper commented 8 years ago

The select for update query will lock for the duration of that select query, but will not lock after that, so I don't think it would eliminate this vulnerability. I think if you want to support this you'll need to create a table in the database that has an auto-incrementing PK. Instead of generating the new PK with [func.max(table.c.tree_id) + 1], you'll have to insert into that table and then select SELECT LAST_INSERT_ID(). http://stackoverflow.com/questions/17112852/get-the-new-record-primary-key-id-from-mysql-insert-query

If you did this, you could still maintain integer-based tree_ids and eliminate the multithreading bug.

uralbash commented 8 years ago

Items that must be completed:

[ ] write test for multithreding
[ ] add a tree_id type selection option: Integer, ForeignKey, UUID (It's will be more flexible)
[ ] write docs
[ ] make 0.2.2 release

uralbash / sqlalchemy_mptt

Migrates tree_id from an `Integer` to `Unicode(32)` #53