Go back and add optimized model methods

rcxwhiz commented 4 months ago

There are some models I added which could have further optimized methods. I am not planning on going back and adding those for 1.0. I would like to go ahead and get that out, and then start going back and fixing this stuff.

rcxwhiz commented 4 months ago

After doing some initial benchmarks, there are some performance issues that need to be addressed.

First is the NestedSetModel performance is much, much worse than expected across the board. At a minimum there needs to be some indexing or something going on. The querying is relatively worse than the editing, which is unexpected. There are even things like the test that just adds a bunch of nodes, NSM gets hung up on that and it must be from searching every single instance to find the highest _right value over and over again.

The children query on the PathEnumerationModel looks a tad slow. I think there might be something to be done there where you can get all the children, then order them by depth. Delete parent looks like it's going really well.

The only thing with AdjacencyListModel is the set parent looks a little long. I expected it to be a little long, but this is a little longer than that. I don't know what the issue would be there off the top of my head.

rcxwhiz commented 4 months ago

The methods that still need improvement: PathEnumerationModel

children() function needs to be customized to not repeatedly call direct_children(). I think that once there is the initial query for all children, there should not be any more queries if possible. I don't think there will be any database optimizations that will help out more than just sorting and manipulating the children manually. In fact doing it manually should be faster I think. Along with this there may need to be some sort of sorting by level, but that can be resolved later. I think it might be hard to support all the arguments currently on the children() method this way.

NestedSetModel

children() function needs to be similarly customized. In the case of the nested set model, since there should be indexes on _left and _right, depending on the number of children a node has it is possible it might be worth it to query the database again rather than iterating directly through all the children.
There are a lot of methods that I consider fully optimized: ancestors(), is_child_of(), parent(), direct_children() - but they are much slower than the other models. I could still figure out how to do an optimized children() method that reduces database queries, but based on what I've seen with direct_children() I don't have a ton of hope it would help that much. It doesn't seem like any of the methods are close to performance of the other models. There is a possibility that with a large enough number of instances, the indexing could eventually outperform the manual searching the other models have to do for children. It still has yet to be proven how many instances that would be.

My prediction is that in the end only three models will make sense for various reasons. AdjacnencyListModel for when you need fast edit performance (with unchecked functions), PathEnumerationModel when you need better performance at higher numbers of instances (especially once I get that improved children() method going, and some sort of string based path enumeration model to use as a fallback with oracle and sqlite databases. In the end if there isn't a significant difference between them, it could all come down to one model, but that is yet to be seen.

rcxwhiz commented 4 months ago

After doing more benchmarking and more testing, it would appear that I have been pretty wrong about what my bottlenecks are. It turns out that calls to refresh_from_db are very costly. By cutting them out I was able to shorten most ALM calls by 97%. I can do this with ALM because that stuff is automatically handled by the database, so I don't have to worry about synchronizing stuff between instances. It is for this reason that I don't think the other models are ever going to be as fast as this one. In theory they are just as good, if not better, but the amount of database queries you are introducing with the constant refresh_from_db calls cannot be overcome.

I think what this is going to call for is a 2.0 release that no longer has multiple implementations and is no longer db dependent in any way. Just one really good ALM implementation. As far as doing this at the ORM level goes, you can't beat the fact that ALM only needs the built in functionality.

rcxwhiz commented 4 months ago

Time to merge this as soon as tests are passing, release 1.2.0, then move on to 2.0.

rcxwhiz / django-hierarchical-models

Go back and add optimized model methods #25