nortikin / sverchok

Sverchok
http://nortikin.github.io/sverchok/
GNU General Public License v3.0
2.26k stars 233 forks source link

Mesh data structure #2767

Closed Durman closed 4 years ago

Durman commented 4 years ago

Problem statement

2019-12-25_13-53-30 As you can see on the picture process of plugging sockets to each other is becoming too boring. And the node does not have vertices data and edges data sockets yet.

So the goal is to make something like alias for group of sockets.

Solution

Such issues already was in the past. #184 My proposal is much more simple. Actually it developing of the idea of this #2766

I think for such nodes as bevel, inset face and similar others it would be nice to have mesh input socket instead of (verts, edges, faces, verts data, edges data, faces data). For creating data for such sockets Sverchok should have Mesh in and out nodes (or SvMesh in / out ?)

|  Mesh in   |
+------------+
|       mesh |
| verts      |
| edges      |
| faces      |
| vert data  |
| edge data  |
| face data  |

Mesh output socket will have dictionary type and have certain keys according input data. So algorithm of working with input mesh in such format will be next: a node search certain keys like verts or face data, if such keys like verts and faces which are mandatory for inset faces algorithm are not found then NoDataError is raised, if other keys are not found the node continues working without this data.

Nestedness: Actually it lays out of the boundary of the current problem but with the solution it looks like it can be quite easy to create arbitrary number of nested meshes. More precisely to say meshes can be joined to categories with arbitrary level of nestedness. Some examples:

Category A:
    mesh 1
    mesh 2

Here is dictionary with two keys: mesh 1 and mesh 2. Mesh 1 and mesh 2 are also dictionaries in which has at least key verts. The category dictionary can't include any values exclude dictionary with mesh data and other dictionary which include dictionaries with mesh data. So algorithm for searching mesh dictionaries should be next:

stak_dicts = [input_dict]
while stak_dict:
    current_dict = stack_dicts.pop()
    if 'verts' in current_dict:
         # this is mesh dict
         result.append(main_function(**{current_dict[key] for key in required_keys if key in current_dict})
    else:
        # most likly this is category dict
        stack_dicts.extand([val for val in current_dict.values() if isinstance(val, dict)])
Durman commented 4 years ago

Comparison of using mesh socket with separately setting elements of mesh: 2020-01-30_20-49-54

Durman commented 4 years ago

2020-01-31_08-26-18

portnov commented 4 years ago

Well, it's all nice, but how to maintain compatibility?...

Durman commented 4 years ago

What do you mean?

portnov commented 4 years ago

There are tons of existing nodes, which use 3-4 sockets, not the new one. Are you planning to rewrite them?...

Durman commented 4 years ago

No, both approaches can exists parallel. Old version is low level. New one is higher level.

Durman commented 4 years ago

I'll leave it here. Specific of importing: https://stackoverflow.com/questions/1744258/is-import-module-better-coding-style-than-from-module-import-function

Durman commented 4 years ago

Process of working with data from mesh socket in case a node does not produce and get other data than mesh:

def process(self):
    meshes = self.inputs['Mesh'].sv_get(deepcopy=True)
    for mesh in meshes:
        node_algorithm(mesh, self.mode)
    self.outputs['Mesh'].sv_set(meshes)

The node_algorithm function wrights new data right inside mesh container. Mesh will be actually just a dictionary but with access to data via attributes like this: mesh.verts is the same as mesh['Verts'].

Durman commented 4 years ago

2020-02-03_09-28-43 Actually I am not excited with middle block of nodes in the layout it si too verbose. I think I will search approaches later.

portnov commented 4 years ago

@zeffii your thoughts on this matter?...

Durman commented 4 years ago

Managing with loop data 2020-02-03_11-32-05

Durman commented 4 years ago

Actually I don't see alternative of this proposal. I would like for some nodes which change topology to have next list of sockets:

And the same list should be on output side and probably later it will be interesting to add other mesh data parameters.

portnov commented 4 years ago

The question is just how are we going to deal with exisitng nodes. We may end up with 150+ nodes having old 3 sockets, plus 150+ nodes having new sockets; how are we going to explain all that to users? :)

zeffii commented 4 years ago

my requirements for Sverchok are not as feature rich as that list. I'm not opposed to a Mesh socket that carries such data. Having a whole bunch of duplicated nodes would be unfortunate.

I would prefer to spend time writing a sverchok like node system in rust :)

portnov commented 4 years ago

huh. And then writing a manual on how to compile it, for users? :)

zeffii commented 4 years ago

not only that, i don;t know `rust' at all :)

Durman commented 4 years ago

After all this years of developing, finaly, Sverchok have become capable of creating of outdoor toilet. :smile: 2020-02-03_22-07-19

tuilet

Durman commented 4 years ago

It looks like there is problem with performance. output

Durman commented 4 years ago

https://wiki.blender.org/wiki/Source/Nodes/MeshTypeRequirements

portnov commented 4 years ago

o_O what's that? :)

Durman commented 4 years ago

This is probably how mesh data structure of procedural modeling will be look like in Blender after everything nodes project will be accomplished.

Durman commented 4 years ago

PyMesh data strucutre:

https://pymesh.readthedocs.io/en/latest/basic.html#mesh-data-structure https://github.com/PyMesh/PyMesh/blob/master/python/pymesh/Mesh.py

Durman commented 4 years ago

Nice thing about having mesh data structure is that it already can keep everything what needs for displaying mesh on each step of mesh life. mesh_structure_materials 2020-02-15_22-38-08

Durman commented 4 years ago

I have investigated way of creating mesh directly in bpy.types.Mesh of an object. According documentation it should most efficient way. But I have found that this way is very fragile and can crash Blender easily, looks like does not fit to Sverchok at all. And efficiency is not much more great or at least in my implementation. So I just leave this code here just in case.

cod ```Python import bpy from typing import List, Tuple from sverchok.core.mesh_structure import Mesh from itertools import count def generate_mesh(objects: bpy.types.bpy_prop_collection, meshes: List[Mesh]) -> None: # faster (some tests show 3 times faster, some equal speed) but can crash Blender easily for bl_me, me in zip((prop.obj.data for prop in objects), meshes): if is_topology_changed(bl_me, me): me_edges = generate_unique_edges_from_faces(me.faces) me_faces = correct_faces(len(me.verts), me.faces) ensure_number_of_elements(bl_me, me, me_edges, me_faces) update_edges(bl_me, me_edges) update_loops(bl_me, me_faces, me_edges) update_faces(bl_me, me_faces) update_vertices(bl_me, me) bl_me.update() def is_topology_changed(bl_me: bpy.types.Mesh, me: Mesh) -> bool: return len(bl_me.vertices) != len(me.verts) or len(bl_me.polygons) != len(me.faces) def ensure_number_of_elements(bl_me: bpy.types.Mesh, me: Mesh, me_edges: dict, me_faces: List[List[int]]) -> None: bl_me.clear_geometry() bl_me.vertices.add(len(me.verts)) if me.verts and me_faces: bl_me.edges.add(len(me_edges)) bl_me.loops.add(sum([len(f) for f in me_faces])) bl_me.polygons.add(len(me_faces)) def generate_unique_edges_from_faces(faces: List[List[int]]) -> dict: edge_index = dict() counter = count() for f in faces: for e in edge_list(f): if e not in edge_index: edge_index[e] = next(counter) return edge_index def update_vertices(bl_me: bpy.types.Mesh, me: Mesh) -> None: for bl_v, v in zip(bl_me.vertices, me.verts): bl_v.co = v def update_edges(bl_me: bpy.types.Mesh, me_edges: dict) -> None: for bl_e, e in zip(bl_me.edges, me_edges): bl_e.vertices = e def update_loops(bl_me: bpy.types.Mesh, me_faces: List[List[int]], me_edge_indexes: dict) -> None: if me_faces: vert_edge_indexes = [] for f in me_faces: for vi, e in zip(f, edge_list(f)): vert_edge_indexes.append((vi, me_edge_indexes[e])) for l, (vi, ei) in zip(bl_me.loops, vert_edge_indexes): l.edge_index = ei l.vertex_index = vi def update_faces(bl_me: bpy.types.Mesh, me_faces: List[List[int]]) -> None: if me_faces: next_loop_index = 0 for bl_f, f in zip(bl_me.polygons, me_faces): bl_f.loop_start = next_loop_index bl_f.loop_total = len(f) next_loop_index += len(f) def edge_list(face: List[int]) -> Tuple[int]: for i1, i2 in zip(face, face[1:] + face[:1]): yield tuple(sorted([i1, i2])) def correct_faces(len_verts: int, faces: List[List[int]]) -> List[List[int]]: out_faces = [] for f in faces: if any([i > len_verts for i in f]): continue else: out_faces.append(f) return out_faces ```
zeffii commented 4 years ago

passing around blender structures, and references to them, inside sverchok is going to break eventually. (this is also stated in the documentation) so we avoid that most of the time.

the only fast operation in Mesh land is passing a flat list of verts to a .mesh

mesh.vertices.foreach_set("co", <flat list of coordinates>)
Durman commented 4 years ago

Yes, foreach_set is quite fast. Interesting thing is that with numpy arrays it works a little bit slower.

Test 1 1.0141143000000739
Test 2 0.21012190000010378
Test 3 0.26692159999993237  <- with numpy

Test 1 1.0243915000000925
Test 2 0.21129569999993691
Test 3 0.22625440000001618  <- with numpy
tests ```Python import bpy from timeit import timeit import numpy as np NUM = 1000000 me = bpy.context.active_object.data me.clear_geometry() me.vertices.add(NUM) verts = [(1, 0, 0) for _ in range(NUM)] np_verts = np.array(verts) def test1(me, verts): for me_v, v in zip(me.vertices, verts): me_v.co = v def test2(me, vertes): me.vertices.foreach_set('co', [co for v in verts for co in v]) def test3(me, np_verts): me.vertices.foreach_set('co', np_verts.flatten()) print("Test 1", timeit("test1(me, verts)", "from __main__ import test1, verts, me", number=1)) print("Test 2", timeit("test2(me, verts)", "from __main__ import test2, verts, me", number=1)) print("Test 3", timeit("test3(me, np_verts)", "from __main__ import test3, np_verts, me", number=1)) print("Done") ```
zeffii commented 4 years ago

what about np_verts = np.array(verts, 'f')

that's about 7 times faster here.

Durman commented 4 years ago

Amazing. In my case it have became 30 times faster. What 'f' letter means?


Test 1 1.0403471999998146
Test 2 0.21267190000071423
Test 3 0.007748700001684483
zeffii commented 4 years ago

it's a type hint https://numpy.org/doc/1.18/reference/generated/numpy.array.html "dtype" https://numpy.org/doc/1.18/reference/generated/numpy.dtype.html#numpy.dtype

Durman commented 4 years ago

Well I thought numpy automatically convert python lists with floats to np.array with float but looks like it is doing something different.

>>> verts = [(1, 0, 0) for _ in range(2)]
>>> np_verts = np.array(verts)
>>> np_verts
array([[1, 0, 0],
       [1, 0, 0]])

>>> np_verts = np.array(verts, 'f')
>>> np_verts
array([[1., 0., 0.],
       [1., 0., 0.]], dtype=float32)
Durman commented 4 years ago

With creating of edges there is no such benefit of using numpy arrays unless I don't know something else about numpy.

Test 1 0.024411099999269936
Test 2 0.013266699999803677
Test 3 0.016417299997556256 <- numpy
generate edges test ```Python import bpy from timeit import timeit import numpy as np NUM = 100000 def get_mesh(): me = bpy.context.active_object.data me.clear_geometry() me.vertices.add(NUM) me.edges.add(NUM - 1) return me edges = [[i, i + 1] for i in range(NUM - 1)] np_edges = np.array(edges, 'i') def test1(me, edges): for me_e, e in zip(me.edges, edges): me_e.vertices = e def test2(me, edges): me.edges.foreach_set('vertices', [i for e in edges for i in e]) def test3(me, np_edges): me.edges.foreach_set('vertices', np_edges.flatten()) print("Test 1", timeit("test1(me, edges)", "from __main__ import test1, edges, get_mesh; me = get_mesh()", number=1)) print("Test 2", timeit("test2(me, edges)", "from __main__ import test2, edges, get_mesh; me = get_mesh()", number=1)) print("Test 3", timeit("test3(me, np_edges)", "from __main__ import test3, np_edges, get_mesh; me = get_mesh()", number=1)) print("Done") ```
zeffii commented 4 years ago

I thought numpy automatically convert python lists with floats to np.array with float

it does, but anything automatic will always have overhead. When we provide the dtype the overhead is less brutal.

creating of edges there is no such benefit

the numpy array for edges can still offer speed ups i think, ...maybe even it looks tidier.

zeffii commented 4 years ago

because [i for e in edges for i in e] just looks insane.

Durman commented 4 years ago

Test with creating faces:

Test 1 0.019219599999814818
Test 2 0.0063215000000127475
Test 3 0.013162700000066252 <- Numpy
creating faces test ```Python import bpy from timeit import timeit import numpy as np NUM = 20000 def add_triang(me): me.vertices.add(3) i0, i1, i2 = len(me.vertices) - 3, len(me.vertices) - 2 , len(me.vertices) - 1 me.vertices[i0].co = (0, 0, 0) me.vertices[i1].co = (0, 1, 0) me.vertices[i2].co = (1, 0, 0) me.edges.add(3) me.edges[i0].vertices = (i0, i1) me.edges[i1].vertices = (i1, i2) me.edges[i2].vertices = (i2, i0) def get_mesh(): me = bpy.context.active_object.data me.clear_geometry() for i in range(NUM): add_triang(me) me.loops.add(NUM * 3) me.polygons.add(NUM) return me loops = [[i, i] for i in range(NUM*3)] faces = [[i, 3] for i in range(0, NUM*3, 3)] np_loops = np.array(loops, 'i') np_faces = np.array(faces, 'i') def test1(me, loops, faces): for l, inds in zip(me.loops, loops): l.vertex_index, l.edge_index = inds for f, inds in zip(me.polygons, faces): f.loop_start, f.loop_total = inds def test2(me, loops, faces): me.loops.foreach_set('vertex_index', [vi for vi, ei in loops]) me.loops.foreach_set('edge_index', [ei for vi, ei in loops]) me.polygons.foreach_set('loop_start', [si for si, ti in faces]) me.polygons.foreach_set('loop_total', [ti for si, ti in faces]) def test3(me, np_loops, np_faces): me.loops.foreach_set('vertex_index', np.ascontiguousarray(np_loops[:,0], 'i')) me.loops.foreach_set('edge_index', np.ascontiguousarray(np_loops[:,1], 'i')) me.polygons.foreach_set('loop_start', np.ascontiguousarray(np_faces[:,0], 'i')) me.polygons.foreach_set('loop_total', np.ascontiguousarray(np_faces[:,1], 'i')) print("Test 1", timeit("test1(me, loops, faces)", "from __main__ import test1, loops, faces, get_mesh; me = get_mesh()", number=1)) print("Test 2", timeit("test2(me, loops, faces)", "from __main__ import test2, loops, faces, get_mesh; me = get_mesh()", number=1)) print("Test 3", timeit("test3(me, np_loops, np_faces)", "from __main__ import test3, np_loops, np_faces, get_mesh; me = get_mesh()", number=1)) print("Done") ```
Durman commented 4 years ago

Looks like there is sense of using numpy with floats values only.

Test 1 0.06465090000028795
Test 2 0.015234599999985221
Test 3 0.00209439999980531 <- Numpy
vertex colors test ```Python import bpy import numpy as np from timeit import timeit from random import random me = bpy.context.active_object.data layer = me.vertex_colors.get('test') if not layer: layer = me.vertex_colors.new(name='test') def get_color(): return [random(), random(), random(), random()] colors = [get_color() for _ in me.loops] np_colors = np.array(colors, 'f') def test1(layer, colors): for l, col in zip(layer.data, colors): l.color = col def test2(layer, colors): layer.data.foreach_set('color', [f for col in colors for f in col]) def test3(layer, np_colors): layer.data.foreach_set('color', np_colors.flatten()) print("Test 1", timeit("test1(layer, colors)", "from __main__ import test1, layer, colors", number=1)) print("Test 2", timeit("test2(layer, colors)", "from __main__ import test2, layer, colors", number=1)) print("Test 3", timeit("test3(layer, np_colors)", "from __main__ import test3, layer, np_colors", number=1)) print("Done") ```
Durman commented 4 years ago

Searching attributes in hierarchy of mesh elements: inherited attributs

Durman commented 4 years ago

Speed test cube: Cube subdivixions 100 without topology changes:

with topology changes:

zeffii commented 4 years ago

remember BMesh viewer also has a mode that expects vertex count to be static, and no topology change. In the N-panel ," fixed vertex count".

Durman commented 4 years ago

I have measured the speed with taking this in account.

zeffii commented 4 years ago

ok cool

Durman commented 4 years ago

generate bmesh verts test:

Test1 0.12106750000020838 <- Python list
Test2 0.1735398000000714 <- numpy to python list
Test3 0.5359352000014042 <- numpy
test ```Python import bmesh import numpy as np from timeit import timeit NUM = 100000 def gen_quad(): verts = [(0,0,0), (1,0,0), (1,1,0), (0,1,0)] faces = [[0,1,2,3]] return verts, faces def merge(va, fa, vb, fb): ind = len(va) va.extend(vb) fa.extend([[i + ind for i in f] for f in fb]) def gen_mesh(): verts = [] faces = [] for i in range(NUM): merge(verts, faces, *gen_quad()) return verts, faces v, f = gen_mesh() npv, npf = np.array(v, 'float32'), np.array(f, 'int32') def test1(verts, faces): bm = bmesh.new() [bm.verts.new(v) for v in verts] bm.free() def test2(verts, faces): bm = bmesh.new() [bm.verts.new(v) for v in verts.tolist()] bm.free() def test3(verts, faces): bm = bmesh.new() [bm.verts.new(v) for v in verts] bm.free() print("Test1", timeit("test1(v, f)", "from __main__ import test1, v, f", number=1)) print("Test2", timeit("test2(npv, npf)", "from __main__ import test2, npv, npf", number=1)) print("Test3", timeit("test3(npv, npf)", "from __main__ import test3, npv, npf", number=1)) print("Done") ```

With face generation:

Test1 0.21056620000126713 <- Python lists
Test2 0.2658947999989323 <- Numpy to Python lists
Test3 0.6875750999988668 <- Numpy
test ```Python import bmesh import numpy as np from timeit import timeit NUM = 100000 def gen_quad(): verts = [(0,0,0), (1,0,0), (1,1,0), (0,1,0)] faces = [[0,1,2,3]] return verts, faces def merge(va, fa, vb, fb): ind = len(va) va.extend(vb) fa.extend([[i + ind for i in f] for f in fb]) def gen_mesh(): verts = [] faces = [] for i in range(NUM): merge(verts, faces, *gen_quad()) return verts, faces v, f = gen_mesh() npv, npf = np.array(v, 'float32'), np.array(f, 'int32') def test1(verts, faces): bm = bmesh.new() v = [bm.verts.new(v) for v in verts] [bm.faces.new([v[i] for i in f]) for f in faces] bm.free() #def test2(verts, faces): # <- SLOWER # bm = bmesh.new() # [bm.verts.new(v) for v in verts] # bm.verts.ensure_lookup_table() # [bm.faces.new([bm.verts[i] for i in f]) for f in faces] # bm.free() def test2(verts, faces): bm = bmesh.new() v = [bm.verts.new(v) for v in verts.tolist()] [bm.faces.new([v[i] for i in f]) for f in faces.tolist()] bm.free() def test3(verts, faces): bm = bmesh.new() v = [bm.verts.new(v) for v in verts] [bm.faces.new([v[i] for i in f]) for f in faces] bm.free() print("Test1", timeit("test1(v, f)", "from __main__ import test1, v, f", number=1)) print("Test2", timeit("test2(npv, npf)", "from __main__ import test2, npv, npf", number=1)) print("Test3", timeit("test3(npv, npf)", "from __main__ import test3, npv, npf", number=1)) print("Done") ```
vicdoval commented 4 years ago

Have you tested the performance difference between 'float32' and 'float64'? https://stackoverflow.com/questions/5956783/numpy-float-10x-slower-than-builtin-in-arithmetic-operations

vicdoval commented 4 years ago

Not sure if it can be related but it could: https://devtalk.blender.org/t/alternative-in-2-80-to-create-meshes-from-python-using-the-tessfaces-api/7445/3

And another idea that may give a little boost:

def test3(verts, faces):
    bm = bmesh.new()
    new_vert = bm.verts.new
    [new_vert(v) for v in verts]
    bm.free()
Durman commented 4 years ago

Have you tested the performance difference between 'float32' and 'float64'?

Yes, it looks like there is no difference.

And another idea that may give a little boost:

It makes it faster on 10-20 percent, not bad.))

For now creating Blender objects via Bmesh is fastest in my implementation but I did not finish my experiments yet.)

Durman commented 4 years ago

Actually I have moved much farther of this proposal in my experiment. I have found that just creating alias of sockets are not satisfiable. I'm thinking of creating real mesh data structure based on numpy arrays. Will open another issue for this.

zeffii commented 4 years ago

@Durman did you ever link to SvMesh node code?

Durman commented 4 years ago

I think you will be interested in code of this node: https://github.com/nortikin/sverchok/blob/mesh_data_structure/nodes/mesh/viewer_mesh.py

What I had notice is that this function is super efficient but only with numpy arrays and type of the array should be exactly float32 type. https://github.com/nortikin/sverchok/blob/b0a6a6817dfa41902f7ed82092b9f8f5cc93022e/utils/mesh_structure/visualize.py#L96-L97

Durman commented 2 years ago

Example of generating simple line with 1 million vertices. Only drawing code is measured.

code ```py import bpy from timeit import timeit import numpy as np n = 1000000 obj = bpy.data.objects['Cube'] me = obj.data me.clear_geometry() t1 = timeit('me.vertices.add(n)', 'from __main__ import me, n', number=1) print(f"Init vertices - {t1 * 1000:.2f}ms") cos = np.ravel(np.stack((np.linspace(0, 1, n, dtype=np.float32), np.zeros(n, dtype=np.float32), np.zeros(n, dtype=np.float32)), axis=-1)) t2 = timeit('me.vertices.foreach_set("co", cos)', 'from __main__ import me, cos', number=1) print(f"Set coordinates - {t2 * 1000:.2f}ms") t3 = timeit('me.edges.add(n-1)', 'from __main__ import me, n', number=1) print(f"Intit edges - {t3 * 1000:.2f}ms") edgs = np.ravel(np.concatenate((np.arange(n-1).reshape(-1,1), np.arange(1, n).reshape(-1, 1)), axis=1)).astype(np.int32) t4 = timeit("me.edges.foreach_set('vertices', edgs)", 'from __main__ import me, edgs', number=1) print(f"set edges - {t4 * 1000:.2f}ms") print(f"TOTAL - {(t1 + t2 + t3 + t4) * 1000:.2f}ms") ```
Init vertices - 1.87ms
Set coordinates - 2.73ms
Intit edges - 2.30ms
set edges - 95.14ms
TOTAL - 102.03ms
Durman commented 2 years ago

The answer to my question how to speed up process of creation edges - https://blender.chat/channel/python?msg=7mxrnoK6XW3Lchh8o from Skarn

It is probably because it expects pyobjects, so a conversion happens in the case of a numpy array. In the case of a Python list, the contained elements are already PyObjects.

So it makes sense to measure together with list/array creation Because that's gonna give you the final speed of the overall algo Another option is to create a Cython extension, that would replicate the behavior of foreach_set but epxecting a numpy array. For that Blender API offers a convenient .as_pointer() method on every DNA structure.

c++
switch (prop_type) {
      case PROP_INT:
        array = PyMem_Malloc(sizeof(int) * size);
        if (do_set) {
          for (i = 0; i < size; i++) {
            item = PySequence_GetItem(seq, i);
            ((int *)array)[i] = (int)PyLong_AsLong(item);
            Py_DECREF(item);
          }

          RNA_property_int_set_array(&self->ptr, self->prop, array);
        }
        else {
          RNA_property_int_get_array(&self->ptr, self->prop, array);

          for (i = 0; i < size; i++) {
            item = PyLong_FromLong((long)((int *)array)[i]);
            PySequence_SetItem(seq, i, item);
            Py_DECREF(item);
          }
        }

        break;
      case PROP_FLOAT:
        array = PyMem_Malloc(sizeof(float) * size);
        if (do_set) {
          for (i = 0; i < size; i++) {
            item = PySequence_GetItem(seq, i);
            ((float *)array)[i] = (float)PyFloat_AsDouble(item);
            Py_DECREF(item);
          }

          RNA_property_float_set_array(&self->ptr, self->prop, array);
        }
        else {
          RNA_property_float_get_array(&self->ptr, self->prop, array);

          for (i = 0; i < size; i++) {
            item = PyFloat_FromDouble((double)((float *)array)[i]);
            PySequence_SetItem(seq, i, item);
            Py_DECREF(item);
          }
        }
        break;

Those are two cases of a switch responsible for the internals of foreach_set / foreach_get as you can see see tons of PyObject related things are happening there Let me write a quick example of this

py
mesh.vertices.add(n_verts) # we allocate n verts
mesh_ptr = mesh.as_pointer()
numpy_array_ptr = your_numpy_vert_array.__array_interface__['data'][0]
cython_wrapper_func(mesh_ptr, numpy_array_ptr, n_verts)

The rest can be done in pure Cython, or in C++. I will use C++ for convenience, Cython bind is typical and can be done with a tutorial.

c++
// define Mesh struct by copying it from Blender code, or linking to its headers. (I prefer linking to headers, make updating between Blender versions easy)
Mesh* mesh = reinterpret_cast<Mesh*>(mesh_ptr) // mesh_ptr is uintptr_t we got from Cython, which got it from Python, which got it from Blender API using as_pointer() method.
for (int i = 0; i < mesh->totvert; ++i)
{
   MVert* vert = &mesh->mvert[i];
   vert->co = ....; // assign from numpy here
}

A Cython extension to work with numpy arrays without losing that bit of speed sounds really well to me This will give you the maximum speed. Allocating verts is easier done with python, by a single call to add(). I do all of my exporters/importers in this fashion now. The only downside is that you need to build your addon for each Blender release. And obviously this kind of approach is not crash-safe and can crash Blender if you code something incorrectly. But in the end all of this is for the convenience of the users not having to wait minutes before their scene exports/imports. Using this approach a scene that used to take 40 minutes to export using Python-only version, is now getting exported in 12 seconds. The majority of smaller scenes are done instantly, you don't even have a chance to notice. here is the example of such a module gitlab.com/skarnproject/blender-wow-studio/-/tree/master/io_scene_wmo/wbs_kernel/src An example of the exporter: gitlab.com/skarnproject/blender-wow-studio/-/blob/master/io_scene_wmo/wbs_kernel/src/bl_utils/mesh/wmo/batch_geometry.cpp And its corresponding Cython counterpart, gitlab.com/skarnproject/blender-wow-studio/-/blob/master/io_scene_wmo/wbs_kernel/src/wmo_utils.pyx And what's even cooler, you can use either C++ or Cython parallelization functionality to compute multiple meshes on multiple CPU cores. gitlab.com/skarnproject/blender-wow-studio/-/blob/master/io_scene_wmo/wbs_kernel/src/wmo_utils.pyx#L167 -> and example of parallel mesh export done in Cython