Closed portnov closed 5 years ago
bmesh.ops.boolean(bm_a, bm_b, operation='UNION', material=False)
implemented, get away from using python for that process.The downside to moving the essential nodes to a compiled language is it reduces the number of people who can easily modify them (for their own use, or common use)..
we have a few script nodes that do cool stuff but are not fast, like 'random points on polygons' / 'best_candidate_rect_random_points`, 'topo sort for profile', '3d voronoi' (some SNmk1, some SNLite)
i think topo sort would be an interesting example to write / call in C++.
Some ideas:
if self.tree.draft:
). In draft mode, complex nodes execute some simplified version of their algorithm.need_process(self)
. Default implementation does return any(socket.is_required for socket in self.outputs)
.is_required
, which returns self.is_linked and self.other.need_process()
.need_process()
method.process()
.@zeffii I did some investigations about topo_sort:
I implemented Tarjan's algorithm for topological sort (see https://github.com/portnov/python_cpp_excersises/blob/master/benchmark.py#L31) for compassion. This implementation is limited by maximum recursion level, but it is O(N).
There is boost::graph
C++ library, and it has topological_sort
(https://github.com/portnov/python_cpp_excersises/blob/master/greet_binding.cpp#L251). This method sorts 100000 vertices in about 0.15 seconds on my machine; I did not long last for end of our old algorithm on that amount of data. Tarjan's algorithm on 1000 vertices is about 3 times slower in Python than C++/boost.
"mesh separate loose" node appears to be written relatively good. On the same graph of 20000 vertices it's python code takes 1.8 seconds, while C++ code from boost::graph
takes 1.1 seconds.
Btw, I don't think nodes cache their calculation, do they? i.e. if the inputs are the same and the parameters of a node do not change should they recompute the outputs constantly during animation ? Maybe there is a tradeoff between figuring out if the inputs are the same vs recomputing the outputs, but for some nodes caching may have a performance impact.
oh. portnov. i would love a C++ (or much faster python) implementation of Intersect Edges :) k thx. haha
@DolphinDream in most cases no, they don't. The main problem in adding "universal caching" now is that most nodes work on the whole set of data, though internally in many cases they process data item-by-item. For example, let's consider Math node with Add operation, where some numbers source is connected to X input, and Y is set to just 1. Imagine we have this on the X input:
[1, 2, 3, 4]
Then node calculates result = [2,3,4,5]
. From a first glance, it looks like we can have a cache = dict()
, and do cache[[1,2,3,4]] = [2,3,4,5]
. Well, it will even work. Sometimes. But it will have the following problems:
[1,2,7,4]
. We will see that we do not have such key in cache, so we will have to calculate everything again. Though if we would a bit more clever, we could re-calculate only for 7, and use previously calculated values for 1,2 and 4.We could instead put a caching logic into main loop of each node, and implement correct caching with respect to specific of this node's algorithm. But that would be a lot of code in each node, which is required to 1) write and 2) maintain. I do not see us having that amount of resources currently.
In our Holy Grail, the SverchokRedux, all processing loops of nodes are factored out from nodes to general update mechanism, so each node processes only 1 item at a time. There we could implement a "generic caching mechanism" without each node being aware about caching.
@portnov Actually I was not thinking of having the data itself used as the dictionary keys :)
Let me clarify a little bit. A node has a set of input sockets, properties and output sockets. The output data is generally computed based on input sockets + property values. A node could cache its output data once computed and invalidate this cache any time the input changes from one update call to another. The cache could be done as inputCache[“socketA”] = socketAData... and with every update call the node itself (or perhaps via functionality implemented in a base class) will check if the current socketA data is the same as the cached value for that socketA. If all input socket data is the same as the previously cached values (and the cached node properties) then the node simply returns the cached output values to the output sockets .. outputCache[“socketX”], otherwise, if any input socket or properties changed then the output cache is invalidated and the node would have to recompute the output data and store it in the output cache for the next update cycle.
note: The keys for the cached values can be the socket names (assumed unique) or property names.
One key aspect you need to consider is a quick way to determine if an input changes (perhaps via some quick sort method or hashing etc). Alternatively, a different mechanism that could avoid having to figure out if the input data to a node (passed from another node output socket) changes is to assign a unique ID for the cached output data within a node.. this way the receiving node could simply ask the connected transmitting node the ID of the data it’s sending via the link. This way the receiving node can simply cache the value of the receiving data unique ID .. and if that ID changes then the receiving node knows that it needs to invalidate its output cache since the input data changed.
Example:
Constant IO nodes:
The Random Vector stores a cache of its properties, computes its output (10 random vectors) and stores it in a outputCache[“Random”] = listOfVectors and generates a unique ID for that data outputID[“Random”] = someUniqueID.
The UV Connection node, during its update cycle, will ask the connected input node (RandomVector) for the ID of the connected socket data.. and if it is different than the previous ID (cached on the UV node) then the UV node knows that the output of the Random Vector changed. (this way the UV node doesn’t have to explicitly store the input data in an inputCache and check fi the new input data is different than the data received during the last update cycle. The ID change itself can be used as a criteria for the input data being different.
Variable IO nodes:
Other notes:
Well, this approach in general can work. Only concern is estimated amount of labor that we need to invest in this versus estimated speedup. I'm not ready to say what is bigger right now, need to think. @zeffii any thoughts about approach suggested by @DolphinDream ?
@portnov I agree. Most of the nodes would not get a significant boost from this kind of optimization. Some nodes that have heavier computation may do, depending on the input size etc.
rather than making caching the focus, id prefer to just eliminate any redundant tree updates. The end result is practically the same with much less memory consumption. Some nodes could cache partially for example if verts+edges is unchanged and gets converted into a bm, where some operation is applied to each face. The bm can be cached, and still let another socket pass changed data (like extruding each face with a unique value).
but it's really down to individual nodes.
@zeffii smth like need_process()
method (https://github.com/nortikin/sverchok/issues/1990#issuecomment-354586202 p.2) ?
Btw, i know that python creates compiled versions of the py scripts as pyc files. Are these as fast as cython ? I know the AN has created a cython version of their addon to speedup the addon. Is that possible to do with SV ?
yeah, @portnov exactly that.
@DolphinDream 1) no, pyc files are not actually optimized, they actually exist only because parsing of "bytecode" is faster than parsing of python source itself. 2) It is possible of course, the question is only required amount of efforts.
Hmm. It looks like we have something like @DolphinDream suggested, not sure if it works. @zeffii , @DolphinDream please look at core/socket_data.py.
The term cache
there is perhaps a misnomer depending on the perspective. We also have a .has_changed
which already suppresses many recalcs. but it applies only to tree's not nodes. A .has_changed
for individual nodes and especially for "terminal" nodes could still be interesting to explore.
the socket.sv_get(...., deepcopy=False)
switch for sockets deserves a few paragraphs / code examples to show how it can speed up the node's processing duties, but also explain the implications of letting the data into the node with the explicit agreement that the node won't modify that data - instead (lazily) read from it / iterate over it to generate new data.
There is also some fancy "partial update" thing, at it even seems to work somehow. If I change a property in a middle of the tree, only this node and subsequent ones are called for .process()
.
yes. I think the double update you experienced before digging deeper into this, was caused by the update that an object-viewer node causes after updating Scene , scene updates are also going to trigger a global update in the nodetree.
Regarding https://github.com/nortikin/sverchok/issues/1990#issuecomment-355419718 . I had a quick look at that node, and at a first glance it appears to have not very good algorithms and maths. On my examples I can reduce execution time from 12 seconds to 3 seconds with relatively small amount of code changes. Will do a PR shortly.
@portnov cool. as long as the Intersect Edges algorithm is fully 3d, it can not assume to ignore a 3rd component like z.
About topo_sort node and it's algorithm... https://habr.com/ru/post/451208/ (in russian)
@portnov from your link: https://github.com/slonopotamus/stable-topo-sort/blob/master/test/StableTopoSort.java#L16 https://e-maxx.ru/algo/strong_connected_components Асакхабиев молодец
While trying to improve performance of "bend along surface" node, I did some research about about whether it worth it to move some part of code to C++ (you can see the code at https://github.com/portnov/python_cpp_excersises). Here are the results:
So my conclusions for now are:
Will write some other thoughts later...