nortikin / sverchok

Sverchok
http://nortikin.github.io/sverchok/
GNU General Public License v3.0
2.26k stars 233 forks source link

Performance considerations (discussion) #1990

Closed portnov closed 5 years ago

portnov commented 6 years ago

While trying to improve performance of "bend along surface" node, I did some research about about whether it worth it to move some part of code to C++ (you can see the code at https://github.com/portnov/python_cpp_excersises). Here are the results:

So my conclusions for now are:

Will write some other thoughts later...

zeffii commented 6 years ago

The downside to moving the essential nodes to a compiled language is it reduces the number of people who can easily modify them (for their own use, or common use)..

zeffii commented 6 years ago

we have a few script nodes that do cool stuff but are not fast, like 'random points on polygons' / 'best_candidate_rect_random_points`, 'topo sort for profile', '3d voronoi' (some SNmk1, some SNLite)

zeffii commented 6 years ago

i think topo sort would be an interesting example to write / call in C++.

portnov commented 6 years ago

Some ideas:

  1. Have a "Draft" mode of node tree:
    • Enabled/disabled by a toggle button near "Update tree".
    • Input nodes, such as "A Number", have (and store) separate value for Draft mode (which is usually smaller, but it depends). It is displayed and used instead of main value, when the tree is in Draft mode.
    • Some nodes have (and store) separate values of their settings for Draft mode.
    • All nodes have programmatic access to this mode (smth like if self.tree.draft:). In draft mode, complex nodes execute some simplified version of their algorithm.
  2. Do lesser number of tree updates:
    • Have a setting in addon preferences, how often to recalculate the tree: On any update / On update of node settings / Only manually.
    • Have a setting: When recalculating by scene change event, use Draft mode.
    • Nodes should detect whether they need processing, more precisely. For that:
    • Introduce a new method in base node class, need_process(self). Default implementation does return any(socket.is_required for socket in self.outputs).
    • Introduce a new property in base Socket class, is_required, which returns self.is_linked and self.other.need_process().
    • Viewer nodes return state of their UPD button in the need_process() method.
    • So, if one disables the viewer in the end of pipeline, then the whole pipeline will see that it is disabled and do not execute process().
portnov commented 6 years ago

@zeffii I did some investigations about topo_sort:

I implemented Tarjan's algorithm for topological sort (see https://github.com/portnov/python_cpp_excersises/blob/master/benchmark.py#L31) for compassion. This implementation is limited by maximum recursion level, but it is O(N).

There is boost::graph C++ library, and it has topological_sort (https://github.com/portnov/python_cpp_excersises/blob/master/greet_binding.cpp#L251). This method sorts 100000 vertices in about 0.15 seconds on my machine; I did not long last for end of our old algorithm on that amount of data. Tarjan's algorithm on 1000 vertices is about 3 times slower in Python than C++/boost.

portnov commented 6 years ago

"mesh separate loose" node appears to be written relatively good. On the same graph of 20000 vertices it's python code takes 1.8 seconds, while C++ code from boost::graph takes 1.1 seconds.

DolphinDream commented 6 years ago

Btw, I don't think nodes cache their calculation, do they? i.e. if the inputs are the same and the parameters of a node do not change should they recompute the outputs constantly during animation ? Maybe there is a tradeoff between figuring out if the inputs are the same vs recomputing the outputs, but for some nodes caching may have a performance impact.

zeffii commented 6 years ago

oh. portnov. i would love a C++ (or much faster python) implementation of Intersect Edges :) k thx. haha

portnov commented 6 years ago

@DolphinDream in most cases no, they don't. The main problem in adding "universal caching" now is that most nodes work on the whole set of data, though internally in many cases they process data item-by-item. For example, let's consider Math node with Add operation, where some numbers source is connected to X input, and Y is set to just 1. Imagine we have this on the X input:

[1, 2, 3, 4]

Then node calculates result = [2,3,4,5]. From a first glance, it looks like we can have a cache = dict(), and do cache[[1,2,3,4]] = [2,3,4,5]. Well, it will even work. Sometimes. But it will have the following problems:

We could instead put a caching logic into main loop of each node, and implement correct caching with respect to specific of this node's algorithm. But that would be a lot of code in each node, which is required to 1) write and 2) maintain. I do not see us having that amount of resources currently.

In our Holy Grail, the SverchokRedux, all processing loops of nodes are factored out from nodes to general update mechanism, so each node processes only 1 item at a time. There we could implement a "generic caching mechanism" without each node being aware about caching.

DolphinDream commented 6 years ago

@portnov Actually I was not thinking of having the data itself used as the dictionary keys :)

Let me clarify a little bit. A node has a set of input sockets, properties and output sockets. The output data is generally computed based on input sockets + property values. A node could cache its output data once computed and invalidate this cache any time the input changes from one update call to another. The cache could be done as inputCache[“socketA”] = socketAData... and with every update call the node itself (or perhaps via functionality implemented in a base class) will check if the current socketA data is the same as the cached value for that socketA. If all input socket data is the same as the previously cached values (and the cached node properties) then the node simply returns the cached output values to the output sockets .. outputCache[“socketX”], otherwise, if any input socket or properties changed then the output cache is invalidated and the node would have to recompute the output data and store it in the output cache for the next update cycle.

note: The keys for the cached values can be the socket names (assumed unique) or property names.

One key aspect you need to consider is a quick way to determine if an input changes (perhaps via some quick sort method or hashing etc). Alternatively, a different mechanism that could avoid having to figure out if the input data to a node (passed from another node output socket) changes is to assign a unique ID for the cached output data within a node.. this way the receiving node could simply ask the connected transmitting node the ID of the data it’s sending via the link. This way the receiving node can simply cache the value of the receiving data unique ID .. and if that ID changes then the receiving node knows that it needs to invalidate its output cache since the input data changed.

Example: sv-cachingmockup-demo1

Constant IO nodes:

Variable IO nodes:

Other notes:

portnov commented 6 years ago

Well, this approach in general can work. Only concern is estimated amount of labor that we need to invest in this versus estimated speedup. I'm not ready to say what is bigger right now, need to think. @zeffii any thoughts about approach suggested by @DolphinDream ?

DolphinDream commented 6 years ago

@portnov I agree. Most of the nodes would not get a significant boost from this kind of optimization. Some nodes that have heavier computation may do, depending on the input size etc.

zeffii commented 6 years ago

rather than making caching the focus, id prefer to just eliminate any redundant tree updates. The end result is practically the same with much less memory consumption. Some nodes could cache partially for example if verts+edges is unchanged and gets converted into a bm, where some operation is applied to each face. The bm can be cached, and still let another socket pass changed data (like extruding each face with a unique value).

but it's really down to individual nodes.

portnov commented 6 years ago

@zeffii smth like need_process() method (https://github.com/nortikin/sverchok/issues/1990#issuecomment-354586202 p.2) ?

DolphinDream commented 6 years ago

Btw, i know that python creates compiled versions of the py scripts as pyc files. Are these as fast as cython ? I know the AN has created a cython version of their addon to speedup the addon. Is that possible to do with SV ?

zeffii commented 6 years ago

yeah, @portnov exactly that.

portnov commented 6 years ago

@DolphinDream 1) no, pyc files are not actually optimized, they actually exist only because parsing of "bytecode" is faster than parsing of python source itself. 2) It is possible of course, the question is only required amount of efforts.

portnov commented 6 years ago

Hmm. It looks like we have something like @DolphinDream suggested, not sure if it works. @zeffii , @DolphinDream please look at core/socket_data.py.

zeffii commented 6 years ago

The term cache there is perhaps a misnomer depending on the perspective. We also have a .has_changed which already suppresses many recalcs. but it applies only to tree's not nodes. A .has_changed for individual nodes and especially for "terminal" nodes could still be interesting to explore.

zeffii commented 6 years ago

the socket.sv_get(...., deepcopy=False) switch for sockets deserves a few paragraphs / code examples to show how it can speed up the node's processing duties, but also explain the implications of letting the data into the node with the explicit agreement that the node won't modify that data - instead (lazily) read from it / iterate over it to generate new data.

portnov commented 6 years ago

There is also some fancy "partial update" thing, at it even seems to work somehow. If I change a property in a middle of the tree, only this node and subsequent ones are called for .process().

zeffii commented 6 years ago

yes. I think the double update you experienced before digging deeper into this, was caused by the update that an object-viewer node causes after updating Scene , scene updates are also going to trigger a global update in the nodetree.

portnov commented 6 years ago

Regarding https://github.com/nortikin/sverchok/issues/1990#issuecomment-355419718 . I had a quick look at that node, and at a first glance it appears to have not very good algorithms and maths. On my examples I can reduce execution time from 12 seconds to 3 seconds with relatively small amount of code changes. Will do a PR shortly.

zeffii commented 6 years ago

@portnov cool. as long as the Intersect Edges algorithm is fully 3d, it can not assume to ignore a 3rd component like z.

portnov commented 5 years ago

About topo_sort node and it's algorithm... https://habr.com/ru/post/451208/ (in russian)

nortikin commented 5 years ago

@portnov from your link: https://github.com/slonopotamus/stable-topo-sort/blob/master/test/StableTopoSort.java#L16 https://e-maxx.ru/algo/strong_connected_components Асакхабиев молодец