Open Durman opened 4 years ago
import bpy
def loading_nodes():
iteration = loading_nodes.iteration
text = f'Loading nodes {iteration * (100 / 200)}%'
if iteration >= 200:
bpy.context.screen.areas[3].header_text_set(None)
return
else:
bpy.context.screen.areas[3].header_text_set(text)
loading_nodes.iteration += 1
print(iteration)
return 0.01
loading_nodes.iteration = 0
bpy.app.timers.register(loading_nodes)
Using modal operators is even better. It is possible to cancel tree recalculation on pressing Escape button. And it is even possible do not lock UI so user can modify it any time.
In the example the progress is cancelling when node changes its location and starting again. Then I finish it by pressing escape button.
import bpy
class ModalOperator(bpy.types.Operator):
"""Move an object with the mouse, example"""
bl_idname = "object.modal_operator"
bl_label = "Simple Modal Operator"
_timer = None
progress: bpy.props.FloatProperty()
x: bpy.props.FloatProperty()
def modal(self, context, event):
if event.type == 'ESC' or self.progress > 100:
bpy.context.screen.areas[4].header_text_set(None)
self.cancel(context)
return {'FINISHED'}
elif event.type == 'TIMER':
node_location = bpy.data.node_groups[0].nodes[0].location[0]
if self.x != node_location:
self.x = node_location
self.progress = 0
text = f'Loading nodes {int(self.progress)}%'
bpy.context.screen.areas[4].header_text_set(text)
self.progress += 0.1
return {'PASS_THROUGH'}
def execute(self, context):
wm = context.window_manager
self._timer = wm.event_timer_add(0.01, window=context.window)
wm.modal_handler_add(self)
return {'RUNNING_MODAL'}
def cancel(self, context):
wm = context.window_manager
wm.event_timer_remove(self._timer)
def register():
bpy.utils.register_class(ModalOperator)
register()
bpy.ops.object.modal_operator('INVOKE_DEFAULT')
https://developer.blender.org/rB04c5471ceefb41c9e49bf7c86f07e9e7b8426bb3
sys.executable now points to the Python interpreter (instead of the Blender executable) (04c5471cee). This resolves multiprocessing which failed to spawn now processes on WIN32.
that's exciting!
I was wondering how much operations in second Python can perform.
def f(duration = 1):
c = count()
start_time = time()
while (time() - start_time) < duration:
next(c)
return c
f(1)
count(7067172)
I'm not disappointed.
Hello world addon using multiprocessing for 2.91
bl_info = {
"name": "My Test Add-on",
"blender": (2, 80, 0),
"category": "Object",
}
import multiprocessing, sys, os
from multiprocessing import Pool
def f(x):
return x*x
def register():
import bpy
sys.executable = bpy.app.binary_path
python_executable = os.path.join(sys.exec_prefix, 'bin', 'python.exe')
multiprocessing.set_executable(python_executable)
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
def unregister():
print("Goodbye World")
How is multithreading coming along? Are there concrete plans to allow multithreading within for each loops and when applying a node on multiple elements in a list? Maybe even within some heavy operation nodes, like Booleans?
Multithreading is impossible without refactoring current architecture. All nodes grab data from common storage now and I guess it prevents from multithreading to be possible. I would say that development of Sverchok is rather slow now.
It seems I confirmed my suspicious. I tried to implement simple example of how Sverchok nodes work with data. Currently nodes use global dictionary to read and wright data. The example works until concurrency is switched on. When nodes are executed concurrently global dictionary remains empty. I suspect that new processes spawn the whole module and they edit their local global dictionary. So the root module does not see any changes.
It's possible to pass dictionary into concurrent modules but only via parameters what means that it requires changes of all process methods of all Sverchok nodes.
It prints:
{0: 'DONE', 1: 'DONE', 2: 'DONE', 3: 'DONE', 4: 'DONE'}
{}
The main question is, what exactly do we want to execute in parallel?
A) different branches of node tree: each branch of tree executes in it's own thread/process. The code of each node still executes in one thread. This would require infrastructure changes, as @Durman mentioned. But, a lot of trees have only one branch. Or, for example, two branches, with 10 nodes in one and 2 in another. In most cases, such trees will not have any speedup from parallel processing. A.1) Sub-option 1:, Sverchok would have to look at the whole tree as a graph and figure out which parts could be executed in parallel, and at which execution should be split into threads, and where results from different threads should be combined. A.2) Sub-option 2: When the node is done it's execution, it "pings" it's following nodes saying "data is ready". Each node starts it's execution when it sees that data on all of it's inputs are ready.
B) Code inside some / all of nodes. For example, if the node does mesh subdivision, we could theoretically split the mesh in 4 parts, subdivide each of them in separate thread/process, and then combine results. This will obviously require change in code of nodes themselves. Some algorithms can not be done in parallel due to their nature. Some will require significant effort to parallelize. Nodes with simple algorithms will also gain nothing: if algorithm is simple enough, it's parallel version can be only slower. And, we should not forget, that in many (if not in most) nodes we just call API from blender or from other libraries, so we can not control whether they execute in parallel or not. For example, AFAIK, all bmesh procedures are single-threaded. OTOH, many procedures from FreeCAD library always work in many threads.
C) we could theoretically do both A and B.
D) In ideal world, we could integrate A and B, by implementing "fully dataflow-driven" parallelism. Imagine that there is not single atomic set of data traveling through the node tree, but instead a set of small portions of data, each portion in it's own thread. Each node could process one portion of data in one thread and pass result to the next node. So different portions of data could even theoretically travel through the tree with different speed. In the final nodes, all these portions are then gathered and combined into final result. This approach, again, stumbles on problems of parallelizing node algorithms. Also, in many cases it is not easy at all to combine results of partial processes. For example, if divided one mesh into two parts, subdivided each of them, then when you try to combine them back, you will have to do something like "merge by distance", in one thread again... Honestly I do not see how this idealistic approach could be implemented in Sverchok :) As an option, instead of vaguely-defined "portions of data" we could simply say "each object (each mesh, for example) is processed in it's own thread". But there are many cases when we are processing only one object...
I think we should choes the way of minimal efforts and maximum gain. Most simple thing, I think, is to execute nodes in parallel because it's requires least changes. All other approaches requires changes inside nodes and thus are expensive.
P.S. Not only nodes but trees also can be executed in parallel (if a file has multiple trees).
With a trick it's seems possible to process nodes concurrently without changing current approach of sv_get and sv_set methods.
{0: 'DONE', 1: 'DONE', 2: 'DONE', 3: 'DONE', 4: 'DONE'}
{7: 'DONE', 6: 'DONE', 8: 'DONE', 5: 'DONE', 9: 'DONE'}
Durman is right, I think the focus should be on optimizations which reflect on all the nodes at once, so that the speed-up is sizeable without investing huge amounts of time (if a specific node is found too slow, then someone can try to optimize it specifically, but that shouldn't be done in a systematic way, unless it can be automated). For me, the most obvious parallelizations are the for each loops, both as nodes and within nodes (first level of input list). I think going any further will get more complicated and will only lead to small gains or gains only for specific tree architectures. From what Durman said on a thread somewhere, the second best way to improve performance is to replace classic for loops in nodes by another mechanism. The potential gains are supposed to be even higher than that of multithreading (speed-up in the range of tens to hundreds depending on the nodes), but the amount of work to invest is far greater (maybe there would be a way to automate the modifications needed or to write a procedure simple enough that anyone with decent python skills could update a node code and submit it?).
I have made first attempt to implement something basic and faced with TypeError: cannot pickle 'SvScalarMathNodeMK4' object
when I tried to pass the node to concurrent process to execute. It seems the node can be turned into a pickable object by adding Get and Set State magic methods (I'm not sure why it's unpickable by default).
The next more serious problem looks like that
Traceback (most recent call last):
File "C:\Program Files\Blender Foundation\Blender 3.2\3.2\python\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\Program Files\Blender Foundation\Blender 3.2\3.2\python\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\Program Files\Blender Foundation\Blender 3.2\3.2\python\lib\multiprocessing\pool.py", line 114, in worker
task = get()
File "C:\Program Files\Blender Foundation\Blender 3.2\3.2\python\lib\multiprocessing\queues.py", line 368, in get
return _ForkingPickler.loads(res)
File "...\sverchok\__init__.py", line 69, in <module>
from sverchok.core import sv_registration_utils, init_architecture, make_node_list
File "...\sverchok\core\__init__.py", line 3, in <module>
from sverchok.core.socket_data import clear_all_socket_cache
File "...\sverchok\core\socket_data.py", line 25, in <module>
from bpy.types import NodeSocket
File "C:\Program Files\Blender Foundation\Blender 3.2\3.2\scripts\modules\bpy\__init__.py", line 22, in <module>
from _bpy import (
ModuleNotFoundError: No module named '_bpy'
I assume that when new process is spawning (forking) it imports (executes) an application from parent process. But it seems that Python in new process does not know anything about Blender. Probably in child process there is only Python interpreter and the add-on code. Probably if it would be possible to make Sverchok importable outside Blender it would solve the problem.
Here is simple add-on. It creates a node tree with a node which executes first example from the multiprocessing module documentation.
Some tricks should be done with imports because when new processes are instanced they import sverchok.__init__
file but bpy module is not available there. In the example I replaced all Blender objects with dummy ones if add-on is imported outside of Blender. Though with some efforts the same replacement of imports can be done inside Sverchok it also means that such modules as mathutils
and bmesh
won't be available what probably knocks out half of Sverchok nodes.
The only solution I can see now is using Blender as a Python standalone module inside subprocesses. It will require extra dependency but utilizing all CPU cores probably is worth it.
Also there is easy installation package - https://github.com/TylerGubala/blenderpy And instruction how to use it with multiprosessing module - https://github.com/TylerGubala/blenderpy/wiki/Caveat---Usage-with-multiprocessing And here is extra hint how to distinguish main and sub processes - https://devtalk.blender.org/t/no-module-named-bpy-when-using-python-multiprocessing/18259/4?u=random
So, have you been working on multi-threading as of late? Do you think parallelization of nodes execution when they are in separate branches (within a For Each loop count as separate branches) and on the first level of inputs (maybe by manually explicitly stating what loops within a node are parallelizable) is possible?
Parallelization probably is possible but on the level of update system it is quite complicated. I guess it would be simpler to use parallelization inside nodes.
I don't think it has priority now because we can gain boost of performance by solving problem of faces data structure. It is pure Python now and it's clear that it is not the way to go. Also we might be able to improve vectorization performance.
What faces data structure problem? Does it have to do with nodes overhead to convert data before and after operation? Would it improve nodes performance greatly? I believe you wrote that solving this problem would be very time consuming,as it has to be done manually for each node. Would it really worth the work invested rather than investing it in multi-threading on the level of update system? Well, I agree that in fact both are needed, as overhead on small operations is far too much important, but on large ones (like boolean operations), the only way to really gain something is via multithreading. If solving both problems only requires a few lines very easy to add in each node, that would be best, because refactoring all nodes at this point seems like a daunting task. However, if it is not the case, doing all the work in update system is the only alternative, even if it is less efficient, I think.
Even if it will increase speed of every node it can take too much time to implement and investing time into another areas of Sverchok can be more beneficial. Also it's not yet proven that it really possible make it work and that it will improve performance after all. I'm concern that data should go through pickling process to be passed via processes.
Well, you indeed should start with what solution offers the best speedup/work ratio, although it seems doubtful to me that refactoring, if only slightly, over 600 nodes would need less effort and time than refactoring the update system (if that's the case, just add multi-threading while you're at it!). I do not know how difficult multi-threading would be, but what is certain is that it can only improve notably performance, at least in the vast majority of cases (for very long one branch trees, with very basic operations and working on a single object, it indeed would slow it down, but no one do that).
Nowadays, the way to go about improving CPU performance is mainly adding cores, so there is no way it cannot be faster if you use 16 cores vs 1 core. There may be other areas of improvement more prioritary now, but Sverchok will have to go in that direction eventually. In the worst case, the refactoring can be made progressively with the help of the community using Sverchok, if the changes to perform on the nodes are clearly explained, and once the update system is ready for multi-threading.
I am not famiiar with pickling ; would that pose a problem? With reading/writing speed or something?
Pickling is a process of converting Python objects into stream of bytes what potentially can be time-consuming. First of all a prototype should be done to prove that with our architecture the multiprocessing is possible and that it can improve performance. The prototype should include next:
I think the base unit for a subprocess should indeed be one node, with the update system collecting results of the nodes, and spawning a new subprocess for each node which now has all its data ready. Hopefully, you can find a framework for managing subprocesses efficiently without having to delve into ordonancing them yourself. Additionally, I think a mechanism to automatically multi-thread treatment on input lists first level should be doable, even if there is the need to add a few systematic lines of code to allow update system to look into node code. In the worst case, a simple @something just before a for loop which is parallelizable could indicate to the update system that multi-threading can be performed there. This way, the community could add them if they feel like the node execution is too slow.
There is some news on the topic. There was introduced official bpy Python module. With its help I was able to use multiprocessing inside Blender.
First step is not install Python bpy
blender_python_executable -m pip install bpy
There won't be conflicts with internal bpy module because it is first in the search list.
However it's not enough. If to try to spawn a sub process it will complain that _bpy module is not found. This is because it searches it in blender internal bpy module. The next hack in sverchok.init file helps to spawn sub process without errors.
try:
import _bpy
is_subprocess = False
except:
is_subprocess = True
else:
del _bpy
if is_subprocess:
# Remove internal bpy from search
import sys
blender_bpy_index = sys.path.index('C:\\Program Files\\Blender Foundation\\Blender 3.4\\3.4\\scripts\\modules')
sys.path.pop(blender_bpy_index)
The problem here is that if Blender has other custom add-on enabled they most likely won't let spawned sub process to work and will hang Blender.
Running Python without GIL (in a future) https://peps.python.org/pep-0703/
Probably multithreading is possible with timer module.
https://docs.blender.org/api/current/bpy.app.timers.html