Closed ramcdona closed 1 year ago
Hey, Yes, build options need to be set for full IEEE compilant floating point behavior: round to nearest, ties to even. I have never tried it on Mac, let me see if I can run into same problem.
Indeed, code compiled with clang on mac x64 hangs, but on linux, built with clang or gcc works fine. I'll look into it tonight. Thanks for the sample files!
Ok, I've managed to build and run it successfully, probably fused multiply add is not enabled by default on Mac.
bash-3.2$ clang++ -O3 -DNDEBUG -std=c++17 -ffp-model=strict -ffp-contract=off -mfma delabella.cpp dba_test.cpp -o dba_test
bash-3.2$ ./dba_test
...
Triangulate
[100] sorting vertices (0 ms)
[100] convex hull triangulation (0 ms)
ConstrainEdges
[100] constraining (0 ms)
Done
Great, I'll check that out and see if I can start improving the CMake files to have appropriate flags per platform.
Nice, I don't know what platforms you need to consider, but this you may find interesting: https://gitlab.cern.ch/gaudi/Gaudi/-/merge_requests/1022
Thanks for that link.
I use CMake to build on Windows Visual Studio, Linux gcc and clang, x86-64 only
and MacOS Apple's XCode version of clang on x86-64 and aarch64
I might occasionally have some users who build with gcc on Windows or icc on Linux, but I'll worry about that later.
Rob
The link you included suggests '-ffp-contract=fast -mfma', while your fix uses '-ffp-contract=off -mfma'. Which do you want used?
I believe setting contraction to fast or off in both cases is safe, it guarantees IEEE correct results. Setting it fast additionally should enable use of hardware FMA instruction (if available) but as link suggests, it was pretty unintuitive so clang guys switched to gcc way which doesn't need it. These are my speculations, I haven't tested fast setting yet. Maybe I'll do tomorrow .
OK, I've pushed up a first cut at some CMake improvements to here https://github.com/ramcdona/delabella/tree/CMake_Updates These are essentially untested -- certainly not tested across platforms. However, it should be a good starting point.
I'd be happy to merge your pull request when you think you're done with it. Thanks!
Do you have access to a M1 Mac? I'm seeing differences in behavior that I can't explain. I believe the M1 is set to strict IEEE 754 compliance, but I don't have separate flags for that -- I'm just trusting the normal Clang / XCode options.
How hard would it be to eliminate the use of FMA and other hardware specific optimizations? I'm thinking about using #defines throughout the code to separate hardware-specific optimized code paths from highly portable code paths that may be slightly slower.
No, unfortunately I have no M1 Mac. I can test it on jetson nano which is also arm64. What do you mean by different behavior? Are results from dba different for the same input on M1 Mac? Sharing the input file would help me a lot. The conditional build using Dekker's product in place of FMA is doable and not very hard, still it would require strict fp settings.
My program includes a CSG (Constructive Solid Geometry) engine. I.e. it can take Boolean operations of meshes constructed of triangles. This boils down to calculating the line segment where any two triangles intersect. That line segment is stored with each triangle. After intersection, we have each triangle with zero to many intersecting segments. This is fed into a CDT algorithm to generate triangles. The original triangle as well as all the edges are included as constraints. So, each call to CDT is relatively small, but it is called thousands of times. After triangulation, the new triangles are classified as inside/outside the various bodies and (depending on the Boolean operation - union, subtract, intersect, etc) discarded as needed.
When I run this code, I get artifacts in the result. Right now, I am seeing a bunch of (apparently) zero-area triangles that should not exist that survive the classification and deletion. These artifacts do not occur when CDT is run with Triangle, but they do occur with Delabella. I thought I was also seeing more x86 vs M1 differences, but those are harder to see right now.
I think my next step needs to be to set up my program to run both Triangle and Delabella side-by-side and dump out cases where the results differ.
Thanks for explanation of the context how it is used.
Comparing triangle with delabella outputs sounds like a good idea, but it's hard to compare CDT results when there are more than 3 concyclic vertices somewhere around.
Could it be that output triangle winding direction affects classifier part?
Here are the results of my first attempt comparing DBA and TRI side-by-side. Results are slightly different for x86 and M1.
These files should open and run in the test program I sent before, but the results I get from DBA and TRI are included at the bottom. I have processed the results to ensure consistent winding, and then sorted so the smallest index occurs first in each triangle, and then sorted the triangles by index. These manipulations are not in the test program of course.
I have not checked for more than 3 concyclic vertices. Let me know if that is a common occurrence in these files and I will try to filter that out.
Thanks! I'll check that tomorrow, I've looked into the files on ny phone now, so clearly dlb results are totally messed both m1 and x86.
Is the test program same as you posted previously with just added results dump?
Lets not worry about the M1 differences for now. I suspect we have multiple issues (including perhaps user error). Once we fix the most prominent issue(s), we'll have increasingly specific test cases. It may be that M1 and x86 are behaving the same, but other parts of my program are causing slightly different input to these routines (resulting in slightly different output).
Yes, the same test program should run (you'll have to change the input file name or make it a command line parameter). I just added more information to the bottom of the file that the test program should ignore.
Thanks very much for your help.
I just wrote a visualization program to look at the test files. It looks like I might be feeding in some constraints that cross. I suspect I might have an indexing problem where I feed constraints to DLB vs. TRI...
False alarm, the problem ws with my visualization program -- no crossing constraints and all information is being passed properly to DBA. Here are three images depicting what is going on in this case.
Here are the input points and constraint edges:
Here is the expected result from Triangle:
Here is the result from DBA:
I didn't check all your samples, but one common thing (regarding "zero" area triangles) is that intersection points you insert at the triangle edges can't be represented perfectly using double type, so they are a little bit inside the triangle or a bit outside forming unwanted 'ear' in the latter case. Probably you instruct 'triangle' to remove that ears automatically, for delablella you need to call FloodFill for that. So updated code should look like:
.....
printf( "ConstrainEdges\n" );
idb->ConstrainEdges( nedg, &bounds->a, &bounds->b, sizeof( dba_edge ) );
printf( "Done\n" ); // almost
if ( verts > 0 )
{
printf( "Removing constrained ears\n" );
int tris = idb->FloodFill(false, 0); // idb->GetNumPolygons();
.....
Also, I bet there is something wrong in your secret sorting / comparing code, I don't get in my output any repeated triangles which occur in files you provided.
Hope that help moving things forward. Marcin
It appears 'triangle' is wrong in dlbtest_1.txt
TRI
3
0 0 3 2 <- circle passing through its vertices consumes vertex 4 which breaks Delaunay property.
1 1 4 3
2 2 3 4
DLB (after flood fill)
3
0 0 3 4
1 0 4 2
2 1 4 3
For your convenience I've added ramcdona_tests branch to my repo, so we can x-ray our test programs. Please verify if sorting steps in dba_test.cpp are compatible.
Thanks for explaining the situation with FloodFill. I've made that change to my program and have re-run this initial test. It has increased the number of times that DBA and TRI match. Here is the updated output...
I will check out your new branch and the sorting you've done. I will also double check a few things on my side to make sure the TRI and DBA results really should match.
I'm seeing a bunch of cases where DBA does not triangulate the full triangle. This often results in holes in the final mesh. Here is case 5 for example:
Here's new case5 (upside down, because svg has 0,0 at the upper left corner) processed on my side.
For some reason, the cases that fail in my main program are not failing in the test program. I will work to get this sorted and hopefully either come back with repeatable tests, or a program that works substantially more frequently.
OK, my test code was messing up the output from DLB that was written to file. I at least now have the ability to reproduce my program's results with the test program.
I'm still getting a surprising number of differences from Triangle (though both look acceptable right now). I think the input data to each might be scaled differently, stretching the triangles one way or the other -- resulting in a difference in Delaunay criteria (though no important difference for my purposes).
I'll let you know.
That's great. I've been thinking about flood fill in case where tip of a solid passes through the subject triangle forming a closed contour of constraints not touching any of triangle edges. In this case flood fill will remove triangles in such hole. I'll add a flag parameter for you to avoid it.
Oh, thanks to the very first test case (inf loop), i found a bug in the FloodFill. 2 triangles at the very top (see svg below) would be incorrectly erased. Now it's fixed, along with the added depth parameter I mentioned before, in your case it should be set to 1. Could you use the ramcdona_tests branch (delabella.cpp and delabella.h files) till I merge it to the main branch?
Thanks for this fix, it seems to fix the next issue I was seeing (even with depth=0). I'll keep poking.
FYI, I've updated master branch with this fix, depth parameter and additional optimizations related to it.
Test case for depth checking: dba_test.txt | depth = 0 (unlimited) | depth = 1 |
---|---|---|
Thanks for all of your work on this. It looks like I am able to successfully replace the straightforward calls to Triangle in my application. In fact, this testing actually revealed an issue in a well regarded bit of code https://github.com/erich666/jgt-code/issues/10 that will hopefully get resolved.
To fully replace Triangle, I also need to perform mesh generation to fill in the polygon. I am debating on whether to attempt to add Ruppert's algorithm to Delabella, or to simply add a bunch of uniformly distributed points inside the polygon.
In this area (not the CGT user-case discussed earlier), my program only uses the Delauny solution as a starting point, it then works to improve the quality of the mesh with our own algorithms. Theoretically, we could start with an un-refined Delaunay triangulation, but in practice having the closer-to-final mesh helps.
If I wanted to try adding Ruppert's algorithm to Delabella, do you have any suggestions? Do you think this is a reasonable thing, or are there fundamental reasons due to Delabella's design that implementing something like this will be challenging?
You're welcome, I'm also glad I found a bug during 'support' of your transition. Actually since your initial question about mesh refining possibility in DLB, I continously read articles around Chew's, Rupert's and Triangle's improved 'terminator'algorithms. Hopefully I'll add something sooner than later. What's your time frame?
Sooner is always better than later. If it happens soon, then I would ship it soon. If it needs to wait a while, I would probably ship both Triangle and Delabella for now and complete the transition later. That would still subject the remaining use case to the crashes we see, but it would still help out a bunch.
Looking back to my original message -- one of the (likely lower quality) CDT's I found included a Ruppert's algorithm. I mention this because it was quite small and simple over all... https://github.com/cp3-ws1920/triangulator/blob/master/triangulator.cpp#L302
Obviously the devil is in the details, but if sufficient supporting routines exist, it looks like implementing Ruppert's can be quite compact.
First step is to add container for Steiner points with point insertion/removal while maintaining CDT props on the fly. Then actual refinement algorithm could be added straightforward. Sounds like a matter of 2-3 weeks for me. Regarding details, would it be fine for you if max number of Steiner points has to be provided to refining func? That would make things way easier and run faster.
Interesting, I didn't realize points couldn't be added dynamically after the constraints were added. I assumed that could be done because of your animated logo. How did you create that?
I think a pre-determined number of points will be fine. In particular, if the refinement algorithm returns a flag whether it converged to the angle tolerance or hit the maximum number of points.
Yes, adding points is implemented but only for unconstrained DT.
First part of logo animation, as far as I remember, is unconstrained DT with hack in the code to limit internal iterations (after sorting) to animation fame index. Then constraints are added all at once then flood fill for removing external and hole faces.
BTW, I shipped my program this morning with DBA replacing Triangle in the use cases that don't require the addition of Stenier points. I'll let you know how it goes!
Nice, thank you.
Hey, I'm going to close this issue now, Feel free to open a new issue / discussion post whenever you need a support.
I'm working on converting my program from Triangle to Delabella. I've run into a few issues -- some are proving tough to isolate. As I am able to capture the cases, I will post them here as issue reports.
First, do you expect any differences on an Apple M1 or M2 processor vs. an Intel/AMD x86-64 processor? I am seeing some different behavior when I switch from my new Mac to my old one.
Also, are there any particular compiler options that are required to get correct results from Delabella?
Here is a simple test program and an input file that will into an infinite loop on my x86-64 machine (MacOS, XCode).
dba_test.txt dba_test.cpp.txt