Faster voronoi decomposition

vedadb commented 5 months ago

I have implemented the JCVoronoi (https://github.com/JCash/voronoi) code into spatialDecomposition - which improves the speed of the grain reconstruction by an order of magnitude, on my laptop an ebsd map of ca 650k points has decreased in time from ca 180 s to 30 s. The spatial decomposition itself takes 3-4 s and is not the bottle neck anymore.

The code can be further improved (right now memory is dynamically allocated - can be changed) and the other matrices could be calculated in C++, but given that the speed has been improved so much I don't feel its crucial to further optimize this.

Also, introducing a progression/status bar with such a low calculation time is not necessary (although could be done since I am instantiating a matlabengine within the c++) and would rather be more suitable in the grainCalc function itself which now acts as a bottle neck.

Vedad

vedadb commented 5 months ago

Just want to add, the new voronoi tesselation is called by passing the 'jcvoronoi' option to calcGrains i.e. calcGrains(ebsd,'jcvoronoi')

kilir commented 5 months ago

Thanks a lot for that work! That's impressive. After some quick tests, I can confirm it works really nicely! I'll try some really large files tonight.

Cheers, Rüdiger

nyyssont commented 5 months ago

Also works for me. For a map of about 2.5 Mpix and 111293 grains jcvoronoi MEX file took 3.7 seconds while voronoin took 17 seconds with the option 'qhull'. The amount of grains was equivalent. When the map was used to reconstruct parent grains, the result was equivalent with 'jcvoronoi' and 'qhull' options.

I would imagine for larger maps the benefits become greater.

Great work, congratulations and thank you!

vedadb commented 5 months ago

I just want to add that I only implemented jcvoronoi for the calcGrains function, not having looked at parent grain reconstruction at all. If there is a voronoi tesselation going on there, it can be implemented in that part of the code also

nyyssont commented 5 months ago

I did some more testing this evening... for a map with ~4MPix and ~560 000 grains, "uniquetol" in spatialDecomposition line 36 takes a long time. Jcvoronoi is very fast. Attaching a picture. Any idea how to speed uniquetol up?

vedadb commented 5 months ago

Yes, the reason for using uniquetol at all is to decrease number of vertices (which are close to each other). I changed it instead to a floor + unique(,'rows') (see new commit), this substantially decreases the time for this part of the code and voronoi tesselation (oin my laptop once again becomes the bottleneck).

Vedad

Edit:

Removing vertices this way, although grains and grain properties stay the same, the order of the vertices (I guess) changes somewhat and introduces artifacts in the plotting of grains, see the white lines)

vedadb commented 5 months ago

g - calculated using unique+floor g1 - calculated using uniqutol

boundary segments stay the same - although we use different methods to filter vertices triple points change somewhat (~1.5%) but I would say is acceptable

vedadb commented 5 months ago

Changing floor to round you keep the same computational speed (I expected floor to be faster as it is just removing decimals while rounding is an arithmetic operation) but also retain the correct ordering of vertices and edges and also triple points.

nyyssont commented 5 months ago

I made a quick test with the new version and it appears to be significantly faster for >1MPix maps! I will make another test this evening with the 4 MPix dataset.

nyyssont commented 5 months ago

I tested the latest commit b17ead5.

On my home computer, with the 4 MPix dataset, calcGrains resulted in a grain calculation time of 232 s using: [grains,ebsd('indexed').grainId] = calcGrains(ebsd('indexed'),'angle',3*degree);. Of this time, calcDecomposition took 108 s. The function voronoin took 50 s.

Using [grains,ebsd('indexed').grainId] = calcGrains(ebsd('indexed'),'angle',3*degree,'jcvoronoi'); resulted in a grain calculation time of 175 s. calcDecomposition took 32 s. The function jcvoronoi_mex took 17 s.

As you say, with jcvoronoi the bottleneck becomes calcGrains itself.

I had time today to test the jcvoronoi grain maps for several parent phase reconstructions. The grain maps reconstruct properly and all operations on the grain shapes work as they should.

I can't see anything wrong with how the function now works. Perhaps others have some more things they might care to test. In any case, I think this function should be incorporated into MTEX and I will start using it for grain calculation. Great work!

ralfHielscher commented 5 months ago

Hi Vedad,

thank you very much for your contribution. I was not aware that there is such a nice voronoi tesselation code available as "jc_voronoi". It's quite surprising that it is so much faster then the code shipped with Matlab.

Including this code seem to be a game changer for many people with big data. Since this huge bottleneck has been removed by your code it seems tempting to speed also other parts of the grain reconstruction code. Possibilities are:

It seems "jc_voronoi" includes also an option to work with bounding boxes. As in most cases EBSD data are rectangular we could probably overcome the everything in calcBoundary which includes in particular the convex hull computation.
EulerCycles is time critical Matlab function that most likely can be speed up enormously using a language like c or c++.

Thank you once again for your valuable contribution. All the best, Ralf.

Thank you very much, Ralf.

vedadb commented 5 months ago

Thank you.

Yes, I have considered also replacing the calcBoundary with the bounding box and/or clipping functionality. For rectangular bounding boxes i.e. the normal "mode" there is no additional computational cost to using the bounding box since it is actually already implemented and the calculation is already performed but I currently just ignore it.

For "tight" setting, it is possible to make use of clipper functions: "The library also comes with a second header, that contains code for custom clipping of edges against a convex polygon."

so if we can just define the clipper function for "tight" setting in an efficient way we can use that also. An alternative solution: Introduce a (temporary) dummy phase for those non-indexed points which are connected to the boundary and after normal tesselation - just delete all edges which connect to a dummy site

nyyssont commented 5 months ago

One last thing to round off this discussion... for further optimizing of calcGrains for large maps, calcPolygons is probably the best target. Within calcGrains, EulerCycles takes the most time. Attaching profiles for calcGrains for ~4MPix dataset with ~500k grains:

ralfHielscher commented 5 months ago

Tuomo, one student from Freiberg is currently working exactly at speeding up this computation. Another thing I noticed was that jcvoronoi crashed for larger maps with a segmentation violation. I wonder whether this purely a Linux thing as nobody else reported any issues.

Ralf.

kilir commented 5 months ago

I can confirm the crashes under Linux, even at really small files of just 5e6 points.

Process 482108 (MATLAB) of user 1000 dumped core.

                                               Module libjawt.so without build-id.
                                               Module libsunec.so without build-id.
                                               Module libdcpr.so without build-id.
                                               Module libt2k.so without build-id.
                                               Module libfontmanager.so without build-id.
                                               Module libawt_xawt.so without build-id.
                                               Module libawt.so without build-id.
                                               Module libnio.so without build-id.
                                               Module libnet.so without build-id.
                                               Module libmanagement.so without build-id.
                                               Module libzip.so without build-id.
                                               Module libjava.so without build-id.
                                               Module libverify.so without build-id.
                                               Module libjvm.so without build-id.
                                               Stack trace of thread 482241:
                                               #0  0x00007f99b8ccfb4b n/a (/home/rk/mtex/mtex_dev/mex/jcvoronoi_mex.mexa64 + 0x34b4b)
                                               #1  0x00007f99b8cacb59 n/a (/home/rk/mtex/mtex_dev/mex/jcvoronoi_mex.mexa64 + 0x11b59)
                                               #2  0x00007f9bd0ba5d37 n/a (libmex.so + 0xead37)
                                               #3  0x00007f9bd0ba60ba n/a (libmex.so + 0xeb0ba)
                                               #4  0x00007f9bd0b91a70 n/a (libmex.so + 0xd6a70)
                                               #5  0x00007f9bd1181976 _ZN8Mfh_file20dispatch_file_commonEMS_FviPP11mxArray_tagiS2_EiS2_iS2_ (lib>
                                               #6  0x00007f9bd1182a6c n/a (libmwm_dispatcher.so + 0x182a6c)
                                               #7  0x00007f9bd1182dae _ZN8Mfh_file8dispatchEiPSt10unique_ptrI11mxArray_tagN6matrix6detail17mxDes>
                                               #8  0x00007f9bd066eb32 n/a (libmwlxemainservices.so + 0x26eb32)
                                               #9  0x00007f9bd066ee34 n/a (libmwlxemainservices.so + 0x26ee34)
                                               #10 0x00007f9bb8ac9d2f n/a (libmwm_lxe.so + 0xac9d2f)
                                               #11 0x00007f9bb8abb9d0 n/a (libmwm_lxe.so + 0xabb9d0)
                                               #12 0x00007f9bb8a47cb2 n/a (libmwm_lxe.so + 0xa47cb2)
                                               #13 0x00007f9bb8779700 n/a (libmwm_lxe.so + 0x779700)
                                               #14 0x00007f9bb877ba44 n/a (libmwm_lxe.so + 0x77ba44)
                                               #15 0x00007f9bb8778f69 n/a (libmwm_lxe.so + 0x778f69)
                                               #16 0x00007f9bb878a83f n/a (libmwm_lxe.so + 0x78a83f)
                                               #17 0x00007f9bb878b299 n/a (libmwm_lxe.so + 0x78b299)
                                               #18 0x00007f9bb8778d74 n/a (libmwm_lxe.so + 0x778d74)
                                               #19 0x00007f9bb8778e76 n/a (libmwm_lxe.so + 0x778e76)
                                               #20 0x00007f9bb88c181c n/a (libmwm_lxe.so + 0x8c181c)
                                               #21 0x00007f9bb88c5981 n/a (libmwm_lxe.so + 0x8c5981)
                                               #22 0x00007f9bd07f5118 n/a (libmwlxemainservices.so + 0x3f5118)
                                               #23 0x00007f9bd065ba4a n/a (libmwlxemainservices.so + 0x25ba4a)
                                               #24 0x00007f9bd065e374 n/a (libmwlxemainservices.so + 0x25e374)
                                               #25 0x00007f9bd1181976 _ZN8Mfh_file20dispatch_file_commonEMS_FviPP11mxArray_tagiS2_EiS2_iS2_ (lib>
                                               #26 0x00007f9bd1183279 _ZN8Mfh_file8dispatchEiPP11mxArray_tagiS2_ (libmwm_dispatcher.so + 0x18327>
                                               #27 0x00007f9b53362065 n/a (libmwmcos_impl.so + 0x562065)
                                               #28 0x00007f9b532ab768 n/a (libmwmcos_impl.so + 0x4ab768)
                                               #29 0x00007f9b532ab904 n/a (libmwmcos_impl.so + 0x4ab904)
                                               #30 0x00007f9b532b445e n/a (libmwmcos_impl.so + 0x4b445e)
                                               #31 0x00007f9b532a9765 n/a (libmwmcos_impl.so + 0x4a9765)
                                               #32 0x00007f9b532a97e4 n/a (libmwmcos_impl.so + 0x4a97e4)
                                               #33 0x00007f9b533724fd n/a (libmwmcos_impl.so + 0x5724fd)
                                               #34 0x00007f9bd87c16da _Z18omDirectCallMethodRKN4mcos15COSInternedBaseINS_8internal13IdEmptyPolic>
                                               #35 0x00007f9bc27a67ec n/a (libmwlxeindexing.so + 0x1a67ec)
                                               #36 0x00007f9bc27a6ad8 n/a (libmwlxeindexing.so + 0x1a6ad8)
                                               #37 0x00007f9bc27aced2 n/a (libmwlxeindexing.so + 0x1aced2)
                                               #38 0x00007f9bc28043c3 _ZN9MathWorks3lxe36at_rparen_nargout_mcos_method_rplaceEPvNS_2ts4TypeEPKvR>
                                               #39 0x00007f9bb8a53742 n/a (libmwm_lxe.so + 0xa53742)
                                               #40 0x00007f9bb8779793 n/a (libmwm_lxe.so + 0x779793)
                                               #41 0x00007f9bb877ba44 n/a (libmwm_lxe.so + 0x77ba44)
                                               #42 0x00007f9bb8778f69 n/a (libmwm_lxe.so + 0x778f69)
                                               #43 0x00007f9bb878a83f n/a (libmwm_lxe.so + 0x78a83f)
                                               #44 0x00007f9bb878b299 n/a (libmwm_lxe.so + 0x78b299)
                                               #45 0x00007f9bb8778d74 n/a (libmwm_lxe.so + 0x778d74)
                                               #46 0x00007f9bb8778e76 n/a (libmwm_lxe.so + 0x778e76)
                                               #47 0x00007f9bb88c181c n/a (libmwm_lxe.so + 0x8c181c)
                                               #48 0x00007f9bb88c5981 n/a (libmwm_lxe.so + 0x8c5981)
                                               #49 0x00007f9bd07f5118 n/a (libmwlxemainservices.so + 0x3f5118)
                                               #50 0x00007f9bd06d079f n/a (libmwlxemainservices.so + 0x2d079f)
                                               #51 0x00007f9bd06d7937 n/a (libmwlxemainservices.so + 0x2d7937)
                                               #52 0x00007f9bd06bfa63 n/a (libmwlxemainservices.so + 0x2bfa63)
                                               #53 0x00007f9bd06bde29 n/a (libmwlxemainservices.so + 0x2bde29)
                                               #54 0x00007f9bd10db204 n/a (libmwm_dispatcher.so + 0xdb204)
                                               #55 0x00007f9bd10ddb28 n/a (libmwm_dispatcher.so + 0xddb28)
                                               #56 0x00007f9bd10c09dd _ZN18Mfh_MATLAB_fn_impl8dispatchEiPP11mxArray_tagiS2_ (libmwm_dispatcher.s>
                                               #57 0x00007f9b5344099c n/a (libmwmcos_impl.so + 0x64099c)
                                               #58 0x00007f9bd117b20e _Z21mdBuiltinAsBuiltinFcniPP11mxArray_tagiS1_ (libmwm_dispatcher.so + 0x17>
                                               #59 0x00007f9bd10db204 n/a (libmwm_dispatcher.so + 0xdb204)
                                               #60 0x00007f9bd10ddb28 n/a (libmwm_dispatcher.so + 0xddb28)
                                               #61 0x00007f9bd10c1070 n/a (libmwm_dispatcher.so + 0xc1070)
                                               #62 0x00007f9bd10c13be _ZN18Mfh_MATLAB_fn_impl8dispatchEiPSt10unique_ptrI11mxArray_tagN6matrix6de>
                                               #63 0x00007f9bd066eb32 n/a (libmwlxemainservices.so + 0x26eb32)

Cheers,

Rüdiger

vedadb commented 5 months ago

Interesting, do you know if the error occurs during the diagram generation or when I fetch vertices and edges? (you could test to only leave the jcv_diagram_generate function and run the function). Because that function should not cause segmentation faults unless you don't have enough memory. I would suspect it is the subsequent C++ code which is the cause, since the memory reservation is kind of arbitrary from my side :) so it could be fixed with little effort.

Additionally: I noticed that after deleting ebsd-points and re-running the calcGrains, out of ca 6k grains, one had negative area i.e. the edges were not ordered counter clock wise.

Reason: I looped over delauney sites since each edge is looped over once but the order is not preserved; I am currently changing it to loop over sites, thus preserving the CCW order (but duplicate edges) which should prevent this error.

If you can confirm that the error is not in jcv_diagram_generate I can add an edge/vertex counter to the header which would allow me to allocate precisely enough memory for the V and E vectors.

Vedad

kilir commented 5 months ago

I think I got you. Building jcvoronoi with everything commented after https://github.com/mtex-toolbox/mtex/blob/b4b9f96ded60732529d87614480467d9156d34a9/extern/jcvoronoi/jcvoronoi_mex.cpp#L42 also results in a crash. There's still ~45 GB of free memory.

Running the whole jcvoronoi seems to work fine till ~ 5e6 points but segfaults above. Just running it with only 'jcv_diagram_generate' produces an out of memory even earlier.

The crash also occurs for gcc 11.4 (instead of 13.2.1) and for clang 16.0.6.

kilir commented 5 months ago

I'm not entirely sure if it helps but I used 'simple.c' from the original repository ( https://github.com/JCash/voronoi/blob/d9bfd6ce3ddb75b947e1fea2f9b522addc9d49df/src/examples/simple.c ) and figured it would also segfault if I increase the number of NPOINT in L25 to a sufficiently large numer (somewhat in the range of a realistically large EBSD dataset)

Program received signal SIGSEGV, Segmentation fault.
0x0000555555558fce in main (argc=<error reading variable: Cannot access memory at address 0x7ffffb3b248c>, 
    argv=<error reading variable: Cannot access memory at address 0x7ffffb3b2480>) at simple.c:27
27      int main(int argc, char** argv) {

Since I have no idea about c++, I declared everything in main 'static' and, despite a quite impressive memory consumption and runtime, it finally finished.

Of course this doesn't work on jcvoronoi_mex.cpp, but maybe there's a smarter idea to allocate memory such that it could handle somewhat larger dataset under linux.

Cheers,

Rüdiger

ralfHielscher commented 5 months ago

Hi Rüdiger, I think we have been able to fix the issue. Could you check. If there are no further complain I would make jcvoronoi the default in MTEX. Ralf.

vedadb commented 5 months ago

I have found that the first grain-reconstruction results in the same grain2d object as the old routine. But, after having deleted small grains e.g. ebsd(g(g.grainSize<5)) = []; I get slightly different maps with some grains having negative area.

Also, on the same map doing grain reconstruction on only the indexed phase yields similar but not the same object (!), although no negative areas. Interestingly the boundary segments have the same total length but the number of segments differ as well as number of triple points.

I'm not quite sure where it goes wrong. When doing voronoi tesselation only on indexed phases number of vertices and edges to differ a bit. Maybe this is due to using different algorithms? I tried to look up a bit on algorithms and it seems to be that the Fortunes sweep algorithm, implemented into jcvoronoi, is the go-to algorithm, see e.g. Computational Geometry - Algorithms and Applications by de Berg, Cheong, van Kreveld, Overmars

ralfHielscher commented 5 months ago

Hi Vedad, ist you issue reproducible with any of the mtex sample data sets or can you provide the data? For me everything seems to work perfectly. Ralf.

vedadb commented 5 months ago

Ok - I will check If I can let release some data, if its necessary.
I noticed that- you get different results when doing: e = mtexdata small; g1 = calcGrains(e('indexed'); g2 = calcGrains(e('indexed').gridify); e1 = e.gridify; g3 = calcGrains(e1('indexed'));

g1 == g3 ~= g2

whether jcvoronoi or standard

3. ebsd = loadEBSD(...); ebsd1 = ebsd.gridify;

g1 = calcGrains(ebsd,'jcvoronoi'); g2 = calcGrains(ebsd); g3 = calcGrains(ebsd1,'jcvoronoi'); g4 = calcGrains(ebsd1);

g1 == g2 g3 == g4

All good (kind of)

ebsd = loadEBSD(...); ebsd1 = ebsd.gridify;

[g1,ebsd.grainId] = calcGrains(ebsd,'jcvoronoi'); [g2,ebsd1.grainId]= calcGrains(ebsd1(:),'jcvoronoi');

g1 == g2 (OK!)

ebsd(g1(g1.grainSize<5))=[]; ebsd1(g2(g2.grainSize<5))=[];

[g1,ebsd.grainId] = calcGrains(ebsd,'jcvoronoi'); [g2,ebsd1.grainId]= calcGrains(ebsd1(:),'jcvoronoi');

g1 computes negative areas (!NOT OK!) g2 computes correct grains

Something with in which order the X,Y points are delivered to jcvoronoi results in wrong tesselation (?) which results in negative areas.

Easy fix is to do: ebsd = ebsd.gridify; ebsd = ebsd(:); if data is not gridified...

The same is true for mtexdata small although areas are not negative

ralfHielscher commented 5 months ago

Hi Vedad,

the difference between reconstruction from gridded and non gridded data is not surprising. For the gridded reconstruction we currently use a different (and better) algorithm.

I'm more curious about the negative areas.

Ralf.

kilir commented 5 months ago

While most compilers issue warnings about type conversion, for me, on a standard microsoft windows visual studio compiler -Werror (opr whatever the comparable microsoft falg would be) seems to be set. This results in compilation failing for user who are using visual studio instead of mingw.

>> mex -R2018a jcvoronoi_mex.cpp
Building with 'Microsoft Visual C++ 2022'.
Error using mex
jcvoronoi_mex.cpp
D:\mtex\mtex_dev\extern\jcvoronoi\jc_voronoi.h(335): warning C4244: 'argument': conversion from 'jcv_real'
to 'float', possible loss of data
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(61): warning C4267: 'argument': conversion from 'size_t'
to '_Ty', possible loss of data
        with
        [
            _Ty=int
        ]
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(65): warning C4267: 'argument': conversion from 'size_t'
to '_Ty', possible loss of data
        with
        [
            _Ty=int
        ]
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(76): error C4576: a parenthesized type followed by an
initializer list is a non-standard explicit type conversion syntax
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(76): error C2398: Element '1': conversion from 'int' to
'_Ty' requires a narrowing conversion
        with
        [
            _Ty=size_t
        ]
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(77): error C4576: a parenthesized type followed by an
initializer list is a non-standard explicit type conversion syntax
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(77): error C2398: Element '1': conversion from 'int' to
'_Ty' requires a narrowing conversion
        with
        [
            _Ty=size_t
        ]
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(78): error C4576: a parenthesized type followed by an
initializer list is a non-standard explicit type conversion syntax
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(78): error C2398: Element '1': conversion from 'int' to
'_Ty' requires a narrowing conversion
        with
        [
            _Ty=size_t
        ]
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(79): error C4576: a parenthesized type followed by an
initializer list is a non-standard explicit type conversion syntax
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(79): error C2398: Element '1': conversion from 'int' to
'_Ty' requires a narrowing conversion
        with
        [
            _Ty=size_t
        ]
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(80): error C4576: a parenthesized type followed by an
initializer list is a non-standard explicit type conversion syntax
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(80): error C2398: Element '1': conversion from 'int' to
'_Ty' requires a narrowing conversion
        with
        [
            _Ty=size_t
        ]
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(81): error C4576: a parenthesized type followed by an
initializer list is a non-standard explicit type conversion syntax
D:\mtex\mtex_dev\extern\jcvoronoi\jcvoronoi_mex.cpp(81): error C2398: Element '1': conversion from 'int' to
'_Ty' requires a narrowing conversion
        with
        [
            _Ty=size_t
        ]

ralfHielscher commented 5 months ago

I still do experience crashes under linux. Simplest way to replicate:

mtexdata olivine
grains = calcGrains(ebsd('indexed'));

Ralf.

kilir commented 5 months ago

I can confirm the crash (due to out of memory) . Strangely, it does not crash on a rather completely indexed 2e7 points file, nor does

mtexdata olivine
grains = calcGrains(ebsd);

crash.

mtex-toolbox / mtex

Faster voronoi decomposition #2046

g1 == g2 g3 == g4

All good (kind of)