Closed intractabilis closed 1 year ago
It's due to several defects in this much cited paper. For example they propose to use the lowest point of polygon B as a reference point... But what if there are two or more equally low points? And that's just one example that breaks their algorithm... And I think I know why they where convinced they had a good solution, while it breaks on so many of the tests i created - They used random generated data for testing and random generated data has an extremely low chance of forming parallels between polygon A and B. That is what it comes down to (also the first example is such a case): parallels. They just didn't consider them.
Anyway, libnfporb is a collection of workarounds that make the orbiting approach work kind of as proposed.
The bad news: that makes libnfporb slow. The good news: It all comes down to one function (trimVector
) that consumes virtually all the cpu time in my tests. But I haven't checked your results yet.
Out of curiosity, why are you looking for a NFP implementation? Can you talk about it?
The orbital approach is expected to be very efficient, at least this is the impression I got after reading papers. However, in my experiments it is at least 2 orders of magnitude slower than CGAL's reduced convolution method. It would be nice to investigate if the slowness is a defect of the algorithm, or we can improve the implementation.
I've sent the test data privately. It consists of 36 polygons numbered from 0 to 35. On the polygon pair (2, 18) libnfporb hangs indefinitely. The following pairs: (3, 14), (3, 16), (3, 17), (8, 17), (12, 14), (12, 16), (12, 17), (15, 20), (21, 24), (24, 25), and (28, 30) end up throwing an exception with a message "Unable to complete outer nfp loop: 1".
Here is the performance comparison on pairs where libnfporb is the slowest compared to CGAL:
Run on (16 X 3874.93 MHz CPU s) CPU Caches: L1 Data 48 KiB (x8) L1 Instruction 32 KiB (x8) L2 Unified 1280 KiB (x8) L3 Unified 24576 KiB (x1) Load Average: 0.93, 0.71, 0.68 --------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------- benchmark_cgal/24/27 4523011 ns 4520737 ns 155 benchmark_cgal/24/31 2876381 ns 2874814 ns 244 benchmark_cgal/24/26 1987337 ns 1986189 ns 352 benchmark_cgal/24/33 2306633 ns 2305401 ns 297 benchmark_cgal/23/25 1619837 ns 1619050 ns 430 benchmark_orb/24/27 5694047455 ns 5692760057 ns 1 benchmark_orb/24/31 3049576536 ns 3048878609 ns 1 benchmark_orb/24/26 1877847975 ns 1877413520 ns 1 benchmark_orb/24/33 2259959241 ns 2259457033 ns 1 benchmark_orb/23/25 1518002236 ns 1517642500 ns 1
Excellent work. I get the same results and it makes sense, given that libnfporb is stable but as I said has many workarounds for shortcomings of the mentioned paper.
6 seconds still sounds a bit excessive. I ran a microarchitecture benchmark. Your code issues 259,000,000 floating-point instructions for polygons 24 and 27. This amount of FLOP is enough to calculate FFT of a 512x512 image 16 times.
I don't know what to say except that I am glad I could get it to work despite the defects in the paper. But anyway, I am very interested in making it faster but I am not aware of low hanging fruits. As simple as it seems to orbit one polygon around another (well there are also holes, interlocks, etc.) it is really, really hard to get right. :) Btw. The majority of the CPU time is spent in boost::geometry functions (intersect, overlap, etc.)
Have you tried contacting the authors for comments? Emails are in the paper. Maybe they have some secret sauce.
Yep. No answer.
Well, I have to thank you for the great amount of work you've done to show this algorithm is impractical. The authors' silence probably means they know very well about the problems with their paper. If not your effort, I would most likely have wasted my time implementing it.
In my opinion there is no serious science without open source. especially in computer science that would mean no more bullshitting. I know that there are factors in play, like software-patents, that make the matter more complicated than that. But what is more important? software-patents or a trustworthy science? Anyway, I'm glad that my effort is helpful. Also, I was rewarded with two citations, in a bachelor and in a master thesis that used libnfporb. And if that isn't ironic enough, I don't even have an A-level. Thank you for your comprehension.
I get the same results and it makes sense, given that libnfporb is stable but as I said has many workarounds for shortcomings of the mentioned paper.
Uhm. I just wanted to profile 24/27 using perf (using the "profile" build target of the libnfporb Makefile) but first I tested with the release build and realized that it only took 2s for 1 iteration, which is remarkable since my machine is way less powerful than yours (as you'll see in the report following) . Anyway, when I run you benchmark suite i get the following result:
Run on (8 X 4400 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 1280 KiB (x4)
L3 Unified 12288 KiB (x1)
Load Average: 0.43, 0.51, 0.36
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------
benchmark_cgal/24/27 61367284 ns 61366863 ns 10
benchmark_cgal/24/31 35310071 ns 35307597 ns 20
benchmark_cgal/24/26 24239106 ns 24235630 ns 29
benchmark_cgal/24/33 30250984 ns 30250819 ns 23
benchmark_cgal/23/25 19487795 ns 19486422 ns 36
benchmark_orb/24/27 49712943159 ns 49711742870 ns 1
benchmark_orb/24/31 31513661714 ns 31510963354 ns 1
benchmark_orb/24/26 18847734019 ns 18847256122 ns 1
benchmark_orb/24/33 22661594322 ns 22660991906 ns 1
benchmark_orb/23/25 15026137744 ns 15025408670 ns 1
49s for 1 iteration of 24/27. At the moment I am looking through the build system of the benchmark suite but and i found the following compile command:
c++ -Infp-orb-perf-check.p -I. -I.. -I../include -I/usr/local/include -I/usr/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wnon-virtual-dtor -std=gnu++20 -O0 -g -DFMT_LOCALE -DFMT_SHARED -DCGAL_USE_GMPXX=1 -DBOOST_FILESYSTEM_DYN_LINK=1 -DBOOST_ALL_NO_LIB -DDEBUG -march=native -mtune=native -Wno-deprecated-enum-enum-conversion -ffunction-sections -fdata-sections -fPIC -MD -MQ nfp-orb-perf-check.p/nfp-orb-perf-check.cpp.o -MF nfp-orb-perf-check.p/nfp-orb-perf-check.cpp.o.d -o nfp-orb-perf-check.p/nfp-orb-perf-check.cpp.o -c ../nfp-orb-perf-check.cpp
Which made me realize that the default buildtype debug. So I did a release build with following result:
2022-06-05T10:49:20+02:00
Running build/nfp-orb-perf-check
Run on (8 X 4400 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 1280 KiB (x4)
L3 Unified 12288 KiB (x1)
Load Average: 0.42, 0.52, 0.47
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------
benchmark_cgal/24/27 4891004 ns 4891003 ns 136
benchmark_cgal/24/31 3130488 ns 3128706 ns 225
benchmark_cgal/24/26 2172407 ns 2172363 ns 322
benchmark_cgal/24/33 2533278 ns 2533082 ns 276
benchmark_cgal/23/25 1806403 ns 1806375 ns 355
benchmark_orb/24/27 3823282100 ns 3823111325 ns 1
benchmark_orb/24/31 2050453801 ns 2050392766 ns 1
benchmark_orb/24/26 1274274826 ns 1274209477 ns 1
benchmark_orb/24/33 1539979505 ns 1539962289 ns 1
benchmark_orb/23/25 1037740713 ns 1037673723 ns 1
which is still slower than 2s.
So i built the benchmark program by hand:
g++ -s -pthread -fno-strict-aliasing -L/usr/local/lib -I/usr/local/include -lboost_filesystem -lboost_system -lbenchmark -lgmp -lmpfr -std=c++20 -march=native -I../../src -DNDEBUG -g0 -Ofast -flto -o nfp-orb-perf-check ../nfp-orb-perf-check.cpp
and got the following result:
2022-06-05T11:35:43+02:00
Running ./build/nfp-orb-perf-check
Run on (8 X 1601.28 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 1280 KiB (x4)
L3 Unified 12288 KiB (x1)
Load Average: 0.59, 0.59, 0.60
---------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------
benchmark_cgal/24/27 4862135 ns 4862111 ns 145
benchmark_cgal/24/31 3097166 ns 3097099 ns 226
benchmark_cgal/24/26 2139914 ns 2139891 ns 327
benchmark_cgal/24/33 2501373 ns 2501171 ns 279
benchmark_cgal/23/25 1745026 ns 1745017 ns 400
benchmark_orb/24/27 2869975098 ns 2869954309 ns 1
benchmark_orb/24/31 1600250962 ns 1600240966 ns 1
benchmark_orb/24/26 1104291322 ns 1104281761 ns 1
benchmark_orb/24/33 1339996282 ns 1339987384 ns 1
benchmark_orb/23/25 895057826 ns 895049078 ns 1
Better but still not as fast as using libnfporb/examples/nfp. Anyway i can accept that difference for the moment because it isn't that much. Could you try doing a manual build on the latest libnfporb revision like I did and redo the benchmark?
In the meanwhile I'll go back to the code and see If i can squeeze out an order of magnitude or two :)
for the record I am using "g++ (SUSE Linux) 12.1.0"
I found an unnecessary call and now look at that (current HEAD):
2022-06-05T12:37:51+02:00
Running ./build/nfp-orb-perf-check
Run on (8 X 1427.49 MHz CPU s)
CPU Caches:
CXX := g++
L1 Data 48 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 1280 KiB (x4)
L3 Unified 12288 KiB (x1)
Load Average: 0.52, 0.60, 0.58
---------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------
benchmark_cgal/24/27 4783350 ns 4780879 ns 148
benchmark_cgal/24/31 3035116 ns 3035073 ns 231
benchmark_cgal/24/26 2101172 ns 2101139 ns 331
benchmark_cgal/24/33 2455099 ns 2454937 ns 286
benchmark_cgal/23/25 1717539 ns 1717502 ns 407
benchmark_orb/24/27 445997757 ns 445992804 ns 2
benchmark_orb/24/31 267225518 ns 267222110 ns 3
benchmark_orb/24/26 195266817 ns 195265388 ns 4
benchmark_orb/24/33 237397545 ns 237396646 ns 3
benchmark_orb/23/25 165946719 ns 165945385 ns 4
also i looked into what the minkowski-sum appoach you are using is actually doing and found out that the benchmark isn't fair by far because it can't include perfect fits, antennas and interlocks which libnfporb does.
https://stackoverflow.com/questions/52350747/no-fit-polygon-problem-with-cgal-minkowski-sum-2
also i looked into what the minkowski-sum appoach you are using is actually doing and found out that the benchmark isn't fair by far because it can't include perfect fits, antennas and interlocks which libnfporb does.
https://stackoverflow.com/questions/52350747/no-fit-polygon-problem-with-cgal-minkowski-sum-2
libnfporb doesn't work on the mentioned in the post example
auto Aorb = libnfporb::polygon_t{
{{0, 0}, {10, 0}, {10, 10}, {8, 10}, {8, 2}, {2, 2}, {2, 10}, {0, 10}}, {}
};
auto Borb = libnfporb::polygon_t{
{{0, 0}, {6, 0}, {6, 6}, {4, 6}, {4, 2}, {2, 2}, {2, 6}, {0, 6}}, {}
};
auto Corb = libnfporb::generate_nfp(Aorb, Borb, true);
It stops with
../../include/libnfporb.hpp:78: void libnfporb::remove_co_linear(boost::geometry::model::polygon<point_t, false, true>::ring_type&): Assertion `r.size() > 2' failed
Even if you fix this, it doesn't work on completely not exotic examples. It fails (probably intermittently) at least on 12 pairs of polygons I sent you. See my initial post.
I found an unnecessary call and now look at that (current HEAD):
2022-06-05T12:37:51+02:00 Running ./build/nfp-orb-perf-check Run on (8 X 1427.49 MHz CPU s) CPU Caches: CXX := g++ L1 Data 48 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 1280 KiB (x4) L3 Unified 12288 KiB (x1) Load Average: 0.52, 0.60, 0.58 --------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------- benchmark_cgal/24/27 4783350 ns 4780879 ns 148 benchmark_cgal/24/31 3035116 ns 3035073 ns 231 benchmark_cgal/24/26 2101172 ns 2101139 ns 331 benchmark_cgal/24/33 2455099 ns 2454937 ns 286 benchmark_cgal/23/25 1717539 ns 1717502 ns 407 benchmark_orb/24/27 445997757 ns 445992804 ns 2 benchmark_orb/24/31 267225518 ns 267222110 ns 3 benchmark_orb/24/26 195266817 ns 195265388 ns 4 benchmark_orb/24/33 237397545 ns 237396646 ns 3 benchmark_orb/23/25 165946719 ns 165945385 ns 4
It's still 2 orders of magnitude difference. However, it's an impressive 10 times gain. Maybe 2 more unnecessary calls, and we will be on par with CGAL? :)
So i built the benchmark program by hand:
I think the difference is because you are using long double
, and my compilation asks the compiler to use SIMD. long double
not only prevents using advanced CPU features, but also incurs extra work when the rest of the code uses SIMD.
Even if you fix this, it doesn't work on completely not exotic examples. It fails (probably intermittently) at least on 12 pairs of polygons I sent you. See my initial post.
Let's see what I can do.
g++ -s -pthread -fno-strict-aliasing -L/usr/local/lib -I/usr/local/include -lboost_filesystem -lboost_system -lbenchmark -lgmp -lmpfr -std=c++20 -march=native -I../../src -DNDEBUG -g0 -Ofast -flto -o nfp-orb-perf-check ../nfp-orb-perf-check.cpp
you are right, i forgot to tune. Now i tried with mtune=native and mtune=intel and native made it slower while intel made no difference.
you are right, i forgot to tune. Now i tried with mtune=native and mtune=intel and native made it slower while intel made no difference.
mtune
should be irrelevant because it doesn't change the instruction set.
You are right! But that means I've configuring with SIMD in each of my examples because I used march=native
You are right! But that means I've configuring with SIMD in each of my examples because I used
march=native
Ok, then I am wrong about long double.
You are right! But that means I've configuring with SIMD in each of my examples because I used
march=native
Ok, then I am wrong about long double.
are you test with boost nfp or clliper?clipper faster than cgal or to boost(mu qian zui hao de bi cgal kuai 2--3 bei)
are you test with boost nfp or clliper?clipper faster than cgal or to boost(mu qian zui hao de bi cgal kuai 2--3 bei)
As far as I know, clipper doesn't work with non-convex shapes. Boost doesn't have NFP.
are you test with boost nfp or clliper?clipper faster than cgal or to boost(mu qian zui hao de bi cgal kuai 2--3 bei)
As far as I know, clipper doesn't work with non-convex shapes. Boost doesn't have NFP.
i'm sure you are not use…… clipper can wrok with non-convex,because it Convex decomposition and final across bool operate merage all Convex that is decomposed get non-convex NFP。
you just test can know. or you can see origin code ,that clearly explain what you question, why it can deal with non-convex shapes.
(ganjue nishige xin shou zai nfp fangmian)
Decomposition cannot be faster than reduced convolution, unless it's something very trivial. Show comparative benchmarks on non-convex polygons with a hundred of vertices.
Decomposition cannot be faster than reduced convolution, unless it's something very trivial. Show comparative benchmarks on non-convex polygons with a hundred of vertices.
but it reall fast than cgal。probably,i am not understand it true theory。or it use efficiency deal with non-convex polygons compare to cgal。
Friend, I am ready to believe you, but just post your benchmarks. In my understanding, decomposition cannot be faster than the reduced convolution, besides trivial cases. Present your benchmarks for non-convex polygons with a hundred or so vertices.
Friend, I am ready to believe you, but just post your benchmarks. In my understanding, decomposition cannot be faster than the reduced convolution, besides trivial cases. Present your benchmarks for non-convex polygons with a hundred or so vertices.
could give me your test data?upload git is better.
Absolutely, I can give you even a benchmark program. Unfortunately, I cannot publish the data because of the legal issues. Any email?
btw. I will be back for this issue, just other things in the pipeline. :)
Absolutely, I can give you even a benchmark program. Unfortunately, I cannot publish the data because of the legal issues. Any email?
ok ,could give the benchmark program?
I have an idea. I'm going to make some modifications to read polygons usually used in papers on NFP, and will publish everything here. Hang on.
thanks
I created my own implementation of reduced convolution, it is 6 times faster than CGAL's.
I managed to make my implementation 9 times faster than CGAL.
I managed to make my implementation 9 times faster than CGAL.
sure, i test with bird.poly(you send me the data, 275 points) it cost 2ms finish to calculate. i think it to slow,i plan to reduce to 0.2 ms. with reduced convolution,
There are way better alternatives out by now that is why I'm going to archive this repository
The orbital approach is expected to be very efficient, at least this is the impression I got after reading papers. However, in my experiments it is at least 2 orders of magnitude slower than CGAL's reduced convolution method. It would be nice to investigate if the slowness is a defect of the algorithm, or we can improve the implementation.
I've sent the test data privately. It consists of 36 polygons numbered from 0 to 35. On the polygon pair (2, 18) libnfporb hangs indefinitely. The following pairs: (3, 14), (3, 16), (3, 17), (8, 17), (12, 14), (12, 16), (12, 17), (15, 20), (21, 24), (24, 25), and (28, 30) end up throwing an exception with a message "Unable to complete outer nfp loop: 1".
Here is the performance comparison on pairs where libnfporb is the slowest compared to CGAL: