Open markroper opened 8 years ago
@markroper Thank you for your interest in the library.
When I studied this theme for writing libraries I have not found a mathematical basis for mptt together with RDBMS. So I did it as I understand and focusing on the structure of the models in django-mptt
. dj-mptt
uses integer tree_id
field, sa-mptt
too, it is necessary not only to separate tree but also for sorting and order trees. Therefore, when the manipulation of top nodes it change tree_id too, see:
If we choose the UUID4 it will take up more space (32-4=28bytes per row) in the database and not allow trees to keep order. I think that is wrong. I think it is possible figure out how to make this field is re-defined for more flixibility of this lib.
On the other hand, but is it necessary? DB request in the different thread sends in the separate transaction and ACID must ensure the integrity of data. Could you give an example of how to reproduce this problem.
@uralbash Thanks for the reply! In the use case I have, I do not care about the ordering of the different trees and so maintaining an integer tree id is not a requirement. I understand that you've supported it here as part of the port. Your point about the additional space required for a UUID is a good one, but for my use case, the additional size is not a concern.
The issue I see is on https://github.com/uralbash/sqlalchemy_mptt/blob/master/sqlalchemy_mptt/events.py#L77
Even though the RDBMS has ACID transactions, two threads concurrently querying for [func.max(table.c.tree_id) + 1]
to define a new tree ID will resolve the same value and assign that value to the row that each thread goes to insert. One way to prevent this occurrence would be to use a UUID or GUID (at the expense of ordering) as I've shown here. Another would be to create a separate table to maintain a thread safe auto-incrementing value. Something like: https://www.percona.com/blog/2010/10/04/sharing-an-auto_increment-value-across-multiple-mysql-tables/ and to use an insert into this table to resolve new tree_ids.
I would like to save the sort feature because I use it in pyramid_pages
module. But the problem with multi threading is also necessary to solve. Do you think if I add table lock on time whether when insert/update top node(tree) it will solve the problem? I mean SELECT FOR UPDATE
query when select max tree_id http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=lock#sqlalchemy.orm.query.Query.with_for_update
The select for update query will lock for the duration of that select query, but will not lock after that, so I don't think it would eliminate this vulnerability. I think if you want to support this you'll need to create a table in the database that has an auto-incrementing PK. Instead of generating the new PK with [func.max(table.c.tree_id) + 1]
, you'll have to insert into that table and then select SELECT LAST_INSERT_ID()
. http://stackoverflow.com/questions/17112852/get-the-new-record-primary-key-id-from-mysql-insert-query
If you did this, you could still maintain integer-based tree_ids and eliminate the multithreading bug.
Items that must be completed:
tree_id
type selection option: Integer, ForeignKey, UUID (It's will be more flexible)0.2.2
release
The
tree_id
column is currently an Integer thats managed in application code. New trees are created by querying for the max value in the table and incrementing it by one, which isn't thread safe. An an example, a multithreaded application may try to create two different new trees and assign them both the same tree_id. I've made the tree_id aUnicode(32)
so that tree IDs can be generated as GUIDs in application code without the risk of ID collisions.I like the lib a lot!