nortikin / sverchok

Sverchok
http://nortikin.github.io/sverchok/
GNU General Public License v3.0
2.26k stars 233 forks source link

Multithreading #3646

Open Durman opened 4 years ago

Durman commented 4 years ago

Probably multithreading is possible with timer module.

timer

import bpy
from mathutils import Vector
from math import cos, sin

def move_node():
    iteration = move_node.iteration
    node = bpy.data.node_groups[0].nodes[0]
    node.location = Vector((sin(iteration/10), cos(iteration/10))) * 50
    if iteration >= 200:
        return
    else:
        move_node.iteration += 1
        return 0.01

move_node.iteration = 0

bpy.app.timers.register(move_node)

https://docs.blender.org/api/current/bpy.app.timers.html

Durman commented 4 years ago

loading nodes

import bpy

def loading_nodes():
    iteration = loading_nodes.iteration
    text = f'Loading nodes {iteration * (100 / 200)}%'
    if iteration >= 200:
        bpy.context.screen.areas[3].header_text_set(None)
        return
    else:
        bpy.context.screen.areas[3].header_text_set(text)
        loading_nodes.iteration += 1
        print(iteration)
        return 0.01

loading_nodes.iteration = 0

bpy.app.timers.register(loading_nodes)
Durman commented 4 years ago

Using modal operators is even better. It is possible to cancel tree recalculation on pressing Escape button. And it is even possible do not lock UI so user can modify it any time.

In the example the progress is cancelling when node changes its location and starting again. Then I finish it by pressing escape button.

loading nodes2

import bpy

class ModalOperator(bpy.types.Operator):
    """Move an object with the mouse, example"""
    bl_idname = "object.modal_operator"
    bl_label = "Simple Modal Operator"

    _timer = None

    progress: bpy.props.FloatProperty()
    x: bpy.props.FloatProperty()

    def modal(self, context, event):

        if event.type == 'ESC' or self.progress > 100:
            bpy.context.screen.areas[4].header_text_set(None)
            self.cancel(context)
            return {'FINISHED'}

        elif event.type == 'TIMER':
            node_location = bpy.data.node_groups[0].nodes[0].location[0]
            if self.x != node_location:
                self.x = node_location
                self.progress = 0

            text = f'Loading nodes {int(self.progress)}%'
            bpy.context.screen.areas[4].header_text_set(text)
            self.progress += 0.1
        return {'PASS_THROUGH'}

    def execute(self, context):
        wm = context.window_manager
        self._timer = wm.event_timer_add(0.01, window=context.window)
        wm.modal_handler_add(self)
        return {'RUNNING_MODAL'}

    def cancel(self, context):
        wm = context.window_manager
        wm.event_timer_remove(self._timer)

def register():
    bpy.utils.register_class(ModalOperator)

register()
bpy.ops.object.modal_operator('INVOKE_DEFAULT')
Durman commented 4 years ago

https://developer.blender.org/rB04c5471ceefb41c9e49bf7c86f07e9e7b8426bb3

sys.executable now points to the Python interpreter (instead of the Blender executable) (04c5471cee). This resolves multiprocessing which failed to spawn now processes on WIN32.

zeffii commented 4 years ago

that's exciting!

Durman commented 3 years ago

I was wondering how much operations in second Python can perform.

def f(duration = 1):
    c = count()
    start_time = time()
    while (time() - start_time) < duration:
        next(c)
    return c
f(1)
count(7067172)

I'm not disappointed.

Durman commented 3 years ago

Hello world addon using multiprocessing for 2.91

bl_info = {
    "name": "My Test Add-on",
    "blender": (2, 80, 0),
    "category": "Object",
}

import multiprocessing, sys, os
from multiprocessing import Pool

def f(x):
    return x*x

def register():
    import bpy
    sys.executable = bpy.app.binary_path
    python_executable = os.path.join(sys.exec_prefix, 'bin', 'python.exe')
    multiprocessing.set_executable(python_executable)

    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))

def unregister():
    print("Goodbye World")
ArpegorPSGH commented 2 years ago

How is multithreading coming along? Are there concrete plans to allow multithreading within for each loops and when applying a node on multiple elements in a list? Maybe even within some heavy operation nodes, like Booleans?

Durman commented 2 years ago

Multithreading is impossible without refactoring current architecture. All nodes grab data from common storage now and I guess it prevents from multithreading to be possible. I would say that development of Sverchok is rather slow now.

Durman commented 2 years ago

It seems I confirmed my suspicious. I tried to implement simple example of how Sverchok nodes work with data. Currently nodes use global dictionary to read and wright data. The example works until concurrency is switched on. When nodes are executed concurrently global dictionary remains empty. I suspect that new processes spawn the whole module and they edit their local global dictionary. So the root module does not see any changes.

It's possible to pass dictionary into concurrent modules but only via parameters what means that it requires changes of all process methods of all Sverchok nodes.

code ```py from multiprocessing import Pool from time import sleep from random import random from itertools import count cache = dict() class Node: _id = count() def __init__(self): self.id = next(self._id) def set_data(self, data): cache[self.id] = data def process(self): sleep(random()) self.set_data('DONE') def execute_tree(): for _ in range(5): Node().process() print(cache) cache.clear() def process_node(node): node.process() def execute_tree_async(): with Pool() as p: p.map(process_node, [Node() for n in range(5)]) print(cache) cache.clear() if __name__ == '__main__': execute_tree() execute_tree_async() ```

It prints:

{0: 'DONE', 1: 'DONE', 2: 'DONE', 3: 'DONE', 4: 'DONE'}
{}
portnov commented 2 years ago

The main question is, what exactly do we want to execute in parallel?

A) different branches of node tree: each branch of tree executes in it's own thread/process. The code of each node still executes in one thread. This would require infrastructure changes, as @Durman mentioned. But, a lot of trees have only one branch. Or, for example, two branches, with 10 nodes in one and 2 in another. In most cases, such trees will not have any speedup from parallel processing. A.1) Sub-option 1:, Sverchok would have to look at the whole tree as a graph and figure out which parts could be executed in parallel, and at which execution should be split into threads, and where results from different threads should be combined. A.2) Sub-option 2: When the node is done it's execution, it "pings" it's following nodes saying "data is ready". Each node starts it's execution when it sees that data on all of it's inputs are ready.

B) Code inside some / all of nodes. For example, if the node does mesh subdivision, we could theoretically split the mesh in 4 parts, subdivide each of them in separate thread/process, and then combine results. This will obviously require change in code of nodes themselves. Some algorithms can not be done in parallel due to their nature. Some will require significant effort to parallelize. Nodes with simple algorithms will also gain nothing: if algorithm is simple enough, it's parallel version can be only slower. And, we should not forget, that in many (if not in most) nodes we just call API from blender or from other libraries, so we can not control whether they execute in parallel or not. For example, AFAIK, all bmesh procedures are single-threaded. OTOH, many procedures from FreeCAD library always work in many threads.

C) we could theoretically do both A and B.

D) In ideal world, we could integrate A and B, by implementing "fully dataflow-driven" parallelism. Imagine that there is not single atomic set of data traveling through the node tree, but instead a set of small portions of data, each portion in it's own thread. Each node could process one portion of data in one thread and pass result to the next node. So different portions of data could even theoretically travel through the tree with different speed. In the final nodes, all these portions are then gathered and combined into final result. This approach, again, stumbles on problems of parallelizing node algorithms. Also, in many cases it is not easy at all to combine results of partial processes. For example, if divided one mesh into two parts, subdivided each of them, then when you try to combine them back, you will have to do something like "merge by distance", in one thread again... Honestly I do not see how this idealistic approach could be implemented in Sverchok :) As an option, instead of vaguely-defined "portions of data" we could simply say "each object (each mesh, for example) is processed in it's own thread". But there are many cases when we are processing only one object...

Durman commented 2 years ago

I think we should choes the way of minimal efforts and maximum gain. Most simple thing, I think, is to execute nodes in parallel because it's requires least changes. All other approaches requires changes inside nodes and thus are expensive.

P.S. Not only nodes but trees also can be executed in parallel (if a file has multiple trees).

Durman commented 2 years ago

With a trick it's seems possible to process nodes concurrently without changing current approach of sv_get and sv_set methods.

code ```py from multiprocessing import Pool, Manager from time import sleep from random import random from itertools import count cache = dict() class Node: _id = count() def __init__(self): self.id = next(self._id) def set_data(self, data): cache[self.id] = data def process(self): sleep(random()) self.set_data('DONE') def execute_tree(): for _ in range(5): Node().process() print(cache) cache.clear() def process_node(cache_, node): node.process() cache_.update(cache) def execute_tree_async(): with Manager() as m, Pool() as p: d = m.dict(cache) p.starmap(process_node, [(d, Node()) for _ in range(5)]) print(d) if __name__ == '__main__': execute_tree() execute_tree_async() ```
{0: 'DONE', 1: 'DONE', 2: 'DONE', 3: 'DONE', 4: 'DONE'}
{7: 'DONE', 6: 'DONE', 8: 'DONE', 5: 'DONE', 9: 'DONE'}
ArpegorPSGH commented 2 years ago

Durman is right, I think the focus should be on optimizations which reflect on all the nodes at once, so that the speed-up is sizeable without investing huge amounts of time (if a specific node is found too slow, then someone can try to optimize it specifically, but that shouldn't be done in a systematic way, unless it can be automated). For me, the most obvious parallelizations are the for each loops, both as nodes and within nodes (first level of input list). I think going any further will get more complicated and will only lead to small gains or gains only for specific tree architectures. From what Durman said on a thread somewhere, the second best way to improve performance is to replace classic for loops in nodes by another mechanism. The potential gains are supposed to be even higher than that of multithreading (speed-up in the range of tens to hundreds depending on the nodes), but the amount of work to invest is far greater (maybe there would be a way to automate the modifications needed or to write a procedure simple enough that anyone with decent python skills could update a node code and submit it?).

Durman commented 2 years ago

I have made first attempt to implement something basic and faced with TypeError: cannot pickle 'SvScalarMathNodeMK4' object when I tried to pass the node to concurrent process to execute. It seems the node can be turned into a pickable object by adding Get and Set State magic methods (I'm not sure why it's unpickable by default).

The next more serious problem looks like that

Traceback (most recent call last):
  File "C:\Program Files\Blender Foundation\Blender 3.2\3.2\python\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "C:\Program Files\Blender Foundation\Blender 3.2\3.2\python\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Program Files\Blender Foundation\Blender 3.2\3.2\python\lib\multiprocessing\pool.py", line 114, in worker
    task = get()
  File "C:\Program Files\Blender Foundation\Blender 3.2\3.2\python\lib\multiprocessing\queues.py", line 368, in get
    return _ForkingPickler.loads(res)
  File "...\sverchok\__init__.py", line 69, in <module>
    from sverchok.core import sv_registration_utils, init_architecture, make_node_list
  File "...\sverchok\core\__init__.py", line 3, in <module>
    from sverchok.core.socket_data import clear_all_socket_cache
  File "...\sverchok\core\socket_data.py", line 25, in <module>
    from bpy.types import NodeSocket
  File "C:\Program Files\Blender Foundation\Blender 3.2\3.2\scripts\modules\bpy\__init__.py", line 22, in <module>
    from _bpy import (
ModuleNotFoundError: No module named '_bpy'

I assume that when new process is spawning (forking) it imports (executes) an application from parent process. But it seems that Python in new process does not know anything about Blender. Probably in child process there is only Python interpreter and the add-on code. Probably if it would be possible to make Sverchok importable outside Blender it would solve the problem.

Durman commented 2 years ago

Here is simple add-on. It creates a node tree with a node which executes first example from the multiprocessing module documentation.

simple add-on ```py bl_info = { "name": "Concurrent test", "author": "Soluyanov Sergey", "version": (0, 0, 1), "blender": (3, 0, 0), "location": "New node tree editor", "description": "Using concurrency test", "warning": "", "doc_url": "https://github.com/nortikin/sverchok/issues/3646", "tracker_url": "https://github.com/nortikin/sverchok/issues/3646", "category": "Object", } from multiprocessing import Pool import sys from time import perf_counter is_inside_blender = sys.modules.get('bpy') if is_inside_blender: from bpy.types import NodeTree, Node from bpy.utils import register_class, unregister_class from bpy.props import BoolProperty import nodeitems_utils from nodeitems_utils import NodeCategory, NodeItem else: NodeTree = object Node = object register_class = None unregister_class = None nodeitems_utils = None class NodeCategory: def __init__(self, *args, **kwargs): pass def empty_func(*args, **kwargs): return BoolProperty = empty_func NodeItem = empty_func class MyTree(NodeTree): bl_idname = "MyTree" bl_label = 'Concurrent tree test' bl_icon = "ERROR" class MyNode(Node): bl_idname = "MyNode" bl_label = 'Concurrent node' def exec_update(self, context): self.process() exec: BoolProperty(name="Execute ?", update=exec_update) def draw_buttons(self, context, layout): layout.prop(self, 'exec') def process(self): t = perf_counter() with Pool() as p: print(p.map(f, [1, 2, 3])) print(f"Executed {(perf_counter()-t)*1000}ms") def f(x): return x*x classes = [MyTree, MyNode] class MyNodeCategory(NodeCategory): @classmethod def poll(cls, context): return context.space_data.tree_type == 'MyTree' node_categories = [ MyNodeCategory('SOMENODES', "Concurrent nodes", items=[ NodeItem("MyNode"), ]), ] def register(): for cls in classes: register_class(cls) nodeitems_utils.register_node_categories('CUSTOM_NODES', node_categories) def unregister(): nodeitems_utils.unregister_node_categories('CUSTOM_NODES') for cls in classes: unregister_class(cls) ```

concurrent process

Some tricks should be done with imports because when new processes are instanced they import sverchok.__init__ file but bpy module is not available there. In the example I replaced all Blender objects with dummy ones if add-on is imported outside of Blender. Though with some efforts the same replacement of imports can be done inside Sverchok it also means that such modules as mathutils and bmesh won't be available what probably knocks out half of Sverchok nodes.

The only solution I can see now is using Blender as a Python standalone module inside subprocesses. It will require extra dependency but utilizing all CPU cores probably is worth it.

Also there is easy installation package - https://github.com/TylerGubala/blenderpy And instruction how to use it with multiprosessing module - https://github.com/TylerGubala/blenderpy/wiki/Caveat---Usage-with-multiprocessing And here is extra hint how to distinguish main and sub processes - https://devtalk.blender.org/t/no-module-named-bpy-when-using-python-multiprocessing/18259/4?u=random

ArpegorPSGH commented 2 years ago

So, have you been working on multi-threading as of late? Do you think parallelization of nodes execution when they are in separate branches (within a For Each loop count as separate branches) and on the first level of inputs (maybe by manually explicitly stating what loops within a node are parallelizable) is possible?

Durman commented 2 years ago

Parallelization probably is possible but on the level of update system it is quite complicated. I guess it would be simpler to use parallelization inside nodes.

I don't think it has priority now because we can gain boost of performance by solving problem of faces data structure. It is pure Python now and it's clear that it is not the way to go. Also we might be able to improve vectorization performance.

ArpegorPSGH commented 2 years ago

What faces data structure problem? Does it have to do with nodes overhead to convert data before and after operation? Would it improve nodes performance greatly? I believe you wrote that solving this problem would be very time consuming,as it has to be done manually for each node. Would it really worth the work invested rather than investing it in multi-threading on the level of update system? Well, I agree that in fact both are needed, as overhead on small operations is far too much important, but on large ones (like boolean operations), the only way to really gain something is via multithreading. If solving both problems only requires a few lines very easy to add in each node, that would be best, because refactoring all nodes at this point seems like a daunting task. However, if it is not the case, doing all the work in update system is the only alternative, even if it is less efficient, I think.

Durman commented 2 years ago

Even if it will increase speed of every node it can take too much time to implement and investing time into another areas of Sverchok can be more beneficial. Also it's not yet proven that it really possible make it work and that it will improve performance after all. I'm concern that data should go through pickling process to be passed via processes.

ArpegorPSGH commented 2 years ago

Well, you indeed should start with what solution offers the best speedup/work ratio, although it seems doubtful to me that refactoring, if only slightly, over 600 nodes would need less effort and time than refactoring the update system (if that's the case, just add multi-threading while you're at it!). I do not know how difficult multi-threading would be, but what is certain is that it can only improve notably performance, at least in the vast majority of cases (for very long one branch trees, with very basic operations and working on a single object, it indeed would slow it down, but no one do that).

Nowadays, the way to go about improving CPU performance is mainly adding cores, so there is no way it cannot be faster if you use 16 cores vs 1 core. There may be other areas of improvement more prioritary now, but Sverchok will have to go in that direction eventually. In the worst case, the refactoring can be made progressively with the help of the community using Sverchok, if the changes to perform on the nodes are clearly explained, and once the update system is ready for multi-threading.

I am not famiiar with pickling ; would that pose a problem? With reading/writing speed or something?

Durman commented 2 years ago

Pickling is a process of converting Python objects into stream of bytes what potentially can be time-consuming. First of all a prototype should be done to prove that with our architecture the multiprocessing is possible and that it can improve performance. The prototype should include next:

ArpegorPSGH commented 2 years ago

I think the base unit for a subprocess should indeed be one node, with the update system collecting results of the nodes, and spawning a new subprocess for each node which now has all its data ready. Hopefully, you can find a framework for managing subprocesses efficiently without having to delve into ordonancing them yourself. Additionally, I think a mechanism to automatically multi-thread treatment on input lists first level should be doable, even if there is the need to add a few systematic lines of code to allow update system to look into node code. In the worst case, a simple @something just before a for loop which is parallelizable could indicate to the update system that multi-threading can be performed there. This way, the community could add them if they feel like the node execution is too slow.

Durman commented 1 year ago

There is some news on the topic. There was introduced official bpy Python module. With its help I was able to use multiprocessing inside Blender.

First step is not install Python bpy

blender_python_executable -m pip install bpy

There won't be conflicts with internal bpy module because it is first in the search list.

However it's not enough. If to try to spawn a sub process it will complain that _bpy module is not found. This is because it searches it in blender internal bpy module. The next hack in sverchok.init file helps to spawn sub process without errors.

try:
    import _bpy
    is_subprocess = False
except:
    is_subprocess = True
else:
    del _bpy

if is_subprocess:
    # Remove internal bpy from search
    import sys
    blender_bpy_index = sys.path.index('C:\\Program Files\\Blender Foundation\\Blender 3.4\\3.4\\scripts\\modules')
    sys.path.pop(blender_bpy_index)

The problem here is that if Blender has other custom add-on enabled they most likely won't let spawned sub process to work and will hang Blender.

Durman commented 1 year ago

Running Python without GIL (in a future) https://peps.python.org/pep-0703/

image