mapbox / mapbox-gl-js

Interactive, thoroughly customizable maps in the browser, powered by vector tiles and WebGL
https://docs.mapbox.com/mapbox-gl-js/
Other
11.15k stars 2.22k forks source link

tesselate polygons instead of rendering with stencil buffer #682

Closed ansis closed 9 years ago

ansis commented 10 years ago

We should draw polygons by tesselating them instead of using the stencil buffer trick. Native already does this. https://github.com/mapbox/mapbox-gl-js/commit/7b353122b46503b57e6d5073184883c1c2e21404 is a first attempt that uses libtess.js. Rendering is faster but the tesselation is slow. The buildings layer can take up to 500ms to tesselate.

mourner commented 10 years ago

We should draw polygons by tesselating them instead of using the stencil buffer trick.

Can you list the reasons why here too? Is it exchanging some of worker's time for improved rendering time?

ansis commented 10 years ago

Can you list the reasons why here too?

  • rendering is faster (less fragments, less draw calls)
  • because native does it. we shouldn't diverge
  • tessellating will open up the possibility of having different colours per feature
  • this could be used to batch multiple layers into one draw call
  • tessellating would be necessary for drawing 3d buildings at different heights

But if we can't make tessellation fast enough it might not be worth it.

Did I miss anything @kkaefer?

I also want to see if a simpler algorithm that produces less optimal output (more triangles) could be much faster.

kkaefer commented 10 years ago

No, those are the reasons.

mourner commented 10 years ago

@ansis this looks interesting: https://github.com/jahting/pnltri.js https://github.com/mrdoob/three.js/issues/4920

mourner commented 10 years ago

PNLTRI JS code doesn't look good though...

I'm looking into porting https://code.google.com/p/poly2tri/source/browse/python/seidel.py?repo=archive&r=96caf237ea676601374ae9d492fe4feb8514b695 into some nice clean JS — it's very compact compared to all other algorithms, and also should be fast. Looks like it can support holes with some minor edits too.

Here's a thread with the author of original poly2tri reporting the results of switching to Seidel's algorithm: http://www.box2d.org/forum/viewtopic.php?f=3&t=3430

Here's another thread where he says why he switched to Constrained Delaunay Triangulation: http://www.box2d.org/forum/viewtopic.php?f=3&t=3576 — basically because of better quality triangulation and handling weird polygons better. But the algorithm is also much more complex and bulky.

kkaefer commented 10 years ago

What about the port at https://github.com/r3mi/poly2tri.js?

mourner commented 10 years ago

@kkaefer looks well implemented and we should definitely try it, but it's pretty huge for my tastes. I have a feeling that the world needs a much simpler JS triangulation lib. :)

Sumbera commented 10 years ago

Hi Guys, most fastest per my tests is libtess.js https://github.com/brendankenny/libtess.js , however for large polygons, bottleneck is still there, more info here:

http://blog.sumbera.com/2014/07/28/webgl-polygons-fill-with-libtess-js/

mourner commented 10 years ago

@Sumbera interesting, thanks! I still wonder how it compares to poly2tri.js.

Sumbera commented 10 years ago

N/A so far got error ln 822 poly2tri.js: ('poly2tri EdgeEvent: Collinear not supported!', [eq, p1, ep]); 0x800a139e - JavaScript runtime error: PointError: poly2tri EdgeEvent: Collinear not supported! (16.682605181590467;48.727819905027005) (16.68257538313253;48.72781480964899) (16.682595203716882;48.727818203377225)

mourner commented 10 years ago

@Sumbera what if you run the data through http://sourceforge.net/p/jsclipper/wiki/documentation/#clipperlibclippersimplifypolygon before feeding it to poly2tri?

Sumbera commented 10 years ago

@mourner - I didn't want to modify original (official CZ border) data, but I was able to make dirty fix in poly2tri so it passes, here is relative comparison for 120.186 polygon points in Chrome: libtess: 30 584 ms, poly2tri: 5 297 ms .

mourner commented 10 years ago

@Sumbera wow, looks very promising — thanks for doing this! I wonder if we'll have the same kind of performance gain on lots of small polygons (like the building layer).

Sumbera commented 10 years ago

@mourner I could run 2256 polygons ( municipality borders) (all together > 3M vertexes) with poly2tri 16 701 ms vs 127 834 ms (libtess), however I had to dirty fix other various errors from poly2tri (null triangles or "FLIP failed due to missing triangle...so some polygons were wrong..), while libtess was fine for the same data. If there are some 2d building data with more polygons but less vertexes I can try it too.

mourner commented 10 years ago

@Sumbera awesome! Actually you could try it directly in this project, in the branch mentioned above https://github.com/mapbox/mapbox-gl-js/compare/triangulate

mourner commented 10 years ago

Meanwhile I'm halfway porting poly2tri Seidel algorithm implementation to JS, could be interesting too.

mourner commented 10 years ago

OK, finished porting the Seidel algorithm and it surprisingly seems to work! Still no hole handling and there may be some rough edges and bugs, but here's a result of it run on the poly2tri basic dude shape:

image

Triangulating this dude 1000 times takes ~850ms in Chrome. Now going to benchmark it against poly2tri and then try it with the triangulate branch to get an idea of performance.

mourner commented 10 years ago

My Seidel port was running about 3-4 slower than poly2tri.js on most test shapes. After doing lots of fixes and optimizations it's now only ~30% slower (sometimes even faster on very simple shapes). It still doesn't handle holes and vertexes touching edges though.

While it was an extremely valuable experiment (and I love to delve into isolated algorithmic challenges like this), I think we should go with poly2tri.js for now. The only issue with it is that you need to make sure there are no duplicate points before running it, but it's pretty well written and fast. I may revisit the Seidel's algorithm later.

Sumbera commented 10 years ago

@mourner - thanks for doing this experiment, I can confirm for single polygon of 19416 vertexes, I have got similar results (750ms seidel vs 560ms poly2tri). Seidel was failing with larger polygons with stack overflow and overall triangulation was of worse quality - triangles were overlapping, gaps.

mourner commented 10 years ago

Oh boy when it will ever stop, this Seidel algorithm is haunting me and giving me nightmares, but it looks like I won't settle down until it's awesome. Did a lot of optimizations and refactoring (along with some fixes, so there should be no overlapping triangles and stack overflows):

image

So now it's 50-90% faster than poly2tri.

There's still room for improvement — it's capped by n log(n), although on paper the full algorithm is n log*(n) — I just can't get some little details that are not covered by papers and books right. It also can't handle edges touching each other and other segments (not sure if it can be solved without clipper), and no hole support (although it is possible — I made it successfully support one hole at some point, but multiple holes needs more thought). Winding order doesn't matter though.

mourner commented 10 years ago

Update: just implemented hole support, while not loosing any performance (actually, making it a bit faster):

image

image

mourner commented 10 years ago

Update (perf improvement, also added a test with a big shape (1200 points):

image

mourner commented 10 years ago

Update: tried seidel in the triangulate branch. Performance turned out to be not as great — only about 70-80% faster than libtess in a dense DC area — mainly because libtess isn't actually much slower than poly2tri, it's only a bit slower in my tests. Also, some shapes (in particular big ones like water and landcover), but also some of the buildings are failing — once I fix the failing cases in seidel, performance should improve.

Current isolated results:

typical OSM building (15 vertices)
libtess x 21,478 ops/sec ±0.54% (98 runs sampled)
poly2tri x 27,603 ops/sec ±1.06% (93 runs sampled)
seidel x 55,798 ops/sec ±0.36% (100 runs sampled)
seidel is 102% faster than poly2tri (second fastest)

dude shape (94 vertices)
libtess x 4,681 ops/sec ±0.65% (101 runs sampled)
poly2tri x 3,971 ops/sec ±0.82% (101 runs sampled)
seidel x 7,852 ops/sec ±0.64% (100 runs sampled)
seidel is 68% faster than libtess (second fastest)

monkey (1204 vertices)
libtess x 354 ops/sec ±1.91% (93 runs sampled)
poly2tri x 278 ops/sec ±0.73% (91 runs sampled)
seidel x 484 ops/sec ±1.07% (88 runs sampled)
seidel is 37% faster than libtess (second fastest)
mourner commented 10 years ago

Reported two VT issues that currently cause problems when tesselating polygons (regardless of the used library):

https://github.com/mapbox/mapnik-vector-tile/issues/52 https://github.com/mapbox/mapnik-vector-tile/issues/53

mourner commented 10 years ago

Update regarding degenerate cases — we need to convert polygons to strictly simple polygons (no intersecting edges, no touching edges and vertices) to render features like https://github.com/mapbox/mapnik-vector-tile/issues/53 properly. This is what's causing water/landcover problems in the current triangulate branch, and will cause issue with any other triangulation library as well.

We could do this using JS Clipper. Performance shouldn't such a big hit because we could first run triangulation on non-clipped geometry, then try JS Clipper if the first fails (and it will fail only for a few features per tile). The library is pretty huge though (25kb minified and gzipped). It also doesn't guarantee proper conversion in all cases which is stated in the docs.

mourner commented 10 years ago

I think for fixing degenerate cases, what we should try instead of using clipper is:

  1. Develop a specific lightweight algorithm for removing degenerate edges along clip boundaries, which shouldn't be too hard and will fix most problems of triangulation fails.
  2. Just ignore shapes with intersecting lines — in case of oversimplification, it's fixed when a new tile loads, and other cases where it's like that in the data should be rare.
mourner commented 10 years ago

OK, I mostly done with seidel optimizations — it's now the fastest JS triangulation library by far.

typical OSM building (15 vertices):
seidel x 79,031 ops/sec ±0.55% (99 runs sampled)
libtess x 21,810 ops/sec ±0.56% (98 runs sampled)
poly2tri x 28,848 ops/sec ±1.43% (94 runs sampled)
seidel is 174% faster than poly2tri (second fastest)

dude shape (94 vertices):
seidel x 10,201 ops/sec ±0.54% (100 runs sampled)
libtess x 4,566 ops/sec ±1.03% (100 runs sampled)
poly2tri x 3,692 ops/sec ±1.05% (98 runs sampled)
seidel is 123% faster than libtess (second fastest)

monkey (1204 vertices):
seidel x 611 ops/sec ±0.87% (97 runs sampled)
libtess x 337 ops/sec ±2.32% (93 runs sampled)
poly2tri x 254 ops/sec ±0.83% (89 runs sampled)
seidel is 81% faster than libtess (second fastest)
springmeyer commented 10 years ago

Amazing!!!!

mourner commented 10 years ago

Turns out libtess does handle degenerate cases (I thought it didn't because there was a bug in the original triangulate branch). So if we want to avoid JS Clipper, we can hack around this by doing triangulation in two passes — first seidel, then libtess if seidel didn't return any triangles (and I made sure it returns null if anything goes wrong). Hacky, but should make triangulation 2x faster compared to libtess alone.

mikemorris commented 10 years ago

Wow, awesome work @mourner!

mourner commented 10 years ago

Putting this on hold for now as degeneracies really should be fixed on the VT side. Hacking around this on JS has too many issues. @springmeyer:

Definitely seems ideal to fix on the generation side. That said ideally i'd take a closer look at this not till end of sept/early oct - given other things in the pipeline.

jahting commented 10 years ago

Unfortunately the handling of degenerate cases outside of tesselation is slow (n^2). If you want to be faster you need to use about the same mechanisms which make tesselation fast. So handling e.g. colinear cases inside tesselation improves overall performance, but makes the tesselation code ugly because of several non-reducable cases.

On the other hand, if intersections are an "error" in the input data, the tesselation will have a hard time to triangulate them "correctly".

PnlTri.js 2.0 now handles all cases except intersections and non-adjacent duplicate points. Which I hope to integrate soon.

mourner commented 10 years ago

@jahting thanks a lot for joining the discussion, your work is remarkable! Degenerate cases handling in particular — I found it to be really difficult to deal with. Did you come up with all the approaches to this yourself? Academic papers are really lacking on this topic, — I couldn't find anything applicable to the Seidel's algorithm. I also wonder how degenerate handling makes tesselation faster.

I added pnltri to my benchmarks in mapbox/seidel and have the following results:

typical OSM building (15 vertices):
pnltri x 227,444 ops/sec ±0.56% (100 runs sampled)
seidel x 78,917 ops/sec ±0.68% (101 runs sampled)
libtess x 21,875 ops/sec ±1.13% (95 runs sampled)
poly2tri x 29,354 ops/sec ±2.44% (94 runs sampled)
pnltri is 188% faster than seidel (second fastest)

dude shape (94 vertices):
pnltri x 13,456 ops/sec ±0.71% (97 runs sampled)
seidel x 9,664 ops/sec ±2.98% (94 runs sampled)
libtess x 4,362 ops/sec ±0.98% (97 runs sampled)
poly2tri x 3,773 ops/sec ±0.88% (98 runs sampled)
pnltri is 39% faster than seidel (second fastest)

dude shape with holes (104 vertices):
pnltri x 2,355 ops/sec ±1.01% (98 runs sampled)
seidel x 8,676 ops/sec ±0.91% (99 runs sampled)
libtess x 3,773 ops/sec ±1.04% (97 runs sampled)
poly2tri x 3,384 ops/sec ±1.14% (98 runs sampled)
seidel is 130% faster than libtess (second fastest)

monkey (1204 vertices):
pnltri x 113 ops/sec ±0.49% (85 runs sampled)
seidel x 623 ops/sec ±0.64% (96 runs sampled)
libtess x 342 ops/sec ±1.60% (94 runs sampled)
poly2tri x 239 ops/sec ±3.25% (83 runs sampled)
seidel is 82% faster than libtess (second fastest)

At first my jaw dropped when I saw performance on low-vertice simple polygons like the first case, but then realized that your code applies simple n^2 ear clipping for all polygons without holes. This starts to get noticeable in shapes with more vertices, but I realized it's such a great idea to switch to simple ear clipping for low-vertice polys! This way the triangulation performance for the Mapbox GL JS case (where most shapes are low-vertice ones) should get significantly faster. I'll definitely look into this now.

Your Seidel implementation, although handling degenerate cases, performs noticeably worse than mine though (3.7x difference as you can see on a particular holed case I have) — need to test more.

jahting commented 10 years ago

@mourner thanks a lot for including pnltri in the benchmarks - maybe I can still improve performance. Your implementation has a great performance and I'll definitely look at it :-).

You are right, for polygons without holes I use simple ear clipping. A more specific selection criterium wasn't necessary yet, since it's really fast for most practical sizes (on "monkey" it seems equal) and is quite robust against colinear cases, duplicate points and intersections - it just cannot handle holes.

Generally the handling of (unfortunately even checking for) degenerate cases makes the code slower (happened to mine). But I can imagine that some cases instead of throwing an error could run through lots of comparisons to be handled by some default at the end. Those should get faster if catched early.

Especially with the handling of colinear edges and vertices on other edges I had to come up myself - it's not that difficult just many cases. I use TDD creating test cases first, then the working code, refactoring later. I hope my tests for these cases are complete. Tests are about 75% of PnlTri code - and there is still a lot of scope for refactoring.

My main sources I put in the Readme of PnlTri. I started from [NaM95], which includes copy&paste errors, but have refactored it beyond recognition and e.g. recently completely replaced the part constructing the monotone polygons from trapezoids. The best paper on degenerate cases I found is [Vik01].

For benchmarking:

mourner commented 10 years ago

@jahting thanks for the notes! I feel I'm not nearly as experienced as you in algorithmic challenges, though I'm pretty good at writing very efficient JavaScript implementations. So combining efforts could lead to wonderful results :)

I found a paper that describes converting a polygon with holes to a simple polygon for use with ear clipping. The only assumptions it makes is that winding order of outer ring and holes are opposite, that holes are strictly contained by the outer polygon, and that they don't intersect: http://www.geometrictools.com/Documentation/TriangulationByEarClipping.pdf

If it is efficient enough (unfortunately there's no complexity analysis), maybe we could stick to ear clipping for all cases. Also, your ear clipping implementation produces bad quality triangulation because of lots of "thin" triangles — methods for improving this are described in this paper, and it looks like the overhead shouldn't be too big (complexity growth is the same): http://arxiv.org/pdf/1212.6038.pdf

I think I might have an attempt at a new ear clipping implementation.

Also, I'm still really puzzled by the Seidel's algorithm — my implementation doesn't do precomputing of parent nodes in the query structure, because profiling shows that even without that, queries are not even nearly a bottleneck — they take about 12%, while about 50% is threading/splitting/merging trapezoids. I wonder how you made this part efficient. Noticed in your notes that you managed to avoid splitting in-between trapezoids when threading, but didn't yet understand how. :)

jahting commented 10 years ago

Yep joining forces looks promising :-).

My opinion: ear clipping isn't worth too much effort, but improving the resulting triangles no matter what algorithm was used, e.g. with a restricted Flip algorithm, looks worthwhile, if the shape of the triangles matters.

On Seidel's algorithm:

jahting commented 10 years ago

@mourner followup:

On ear clipping:

On Seidel's:

mourner commented 10 years ago

Impemented a simple ear slicing algorithm that's basically the same as in pnltri: https://github.com/mapbox/seidel/blob/master/src/earcut.js. Results are pretty awesome:

typical OSM building (15 vertices):
earcut x 707,439 ops/sec ±0.41% (100 runs sampled)
pnltri x 227,890 ops/sec ±0.57% (100 runs sampled)
seidel x 77,236 ops/sec ±0.57% (98 runs sampled)
earcut is 210% faster than pnltri (second fastest)

dude shape (94 vertices):
earcut x 33,620 ops/sec ±0.40% (101 runs sampled)
pnltri x 13,626 ops/sec ±0.35% (98 runs sampled)
seidel x 10,037 ops/sec ±0.41% (101 runs sampled)
earcut is 147% faster than pnltri (second fastest)

monkey (1204 vertices):
earcut x 219 ops/sec ±0.40% (88 runs sampled)
pnltri x 112 ops/sec ±0.43% (84 runs sampled)
seidel x 608 ops/sec ±0.79% (97 runs sampled)
seidel is 178% faster than earcut (second fastest)

Going to incorporate it into my seidel implementation, switching to ear clipping if it's without holes and the number of vertices is lower than a certain threshold (maybe about 300). Given that we fix degenerates on the serverside eventually, tesselating performance on the client will be pretty good.

mourner commented 10 years ago

The mechanism which avoids splitting and merging is described in Vik01.

Read through a bit — I'm doing this in a similar way, although I do only one query per edge (one of the points), and then traverse along the edge to the other point, creating new traps, extending old ones and updating query structure as necessary. This is still the most expensive part of the algorithm. With big shapes (e.g. monkey), query times rise to about 30%, but still take less than updating trapezoidation.

jahting commented 10 years ago

@mourner the key concerning extending existing trapezoids is not to think "splitting trapezoids" but "blocking beams". Trapezoids are bordered by 2 segments and 2 beams emitted from vertices. Each vertex emits two beams. Whenever a new segment is inserted going from one trapezoid to the next it intersects, i.e. blocks, a beam. On one side of the segment the beam survives. There you go on using the already existing trapezoid. On the other side of the segment the beam stops to exist. There the trapezoid from above is extended to the next beam. Thus new trapezoids are only created when inserting vertices and never deleted again.

mourner commented 9 years ago

Revisited tesselation today after a long break, here's an update.

We want to get this solved quickly and easily and get practical results, without spending too much time trying to make a perfect solution.

My previous experiments implementing a modern algorithm (Hain trapezoidation) are promising (preliminary results show that it will indeed be a blazing fast near-linear triangulation implementation when finished), but not for today — getting it to a practical state will take a lot of effort. The challenge is converting a pretty complex and confusing trapezoidation algorithm implementation for simple polygons (that I've already ported) into a triangulation algorithm implementation and adding support for holes — I got stuck on this part. And while it will probably be fastest, it won't be significantly faster than a much simpler approach (see below) for our particular use case. So I'm going to put this on a long hold until better times.

For a typical dense map view, using already tried out approaches, here are the fill bucket processing times (parsing protobuf + triangulation + filling buffers):

The last approach (120ms) seems pretty good! The only minor drawback is adding 7kb gzipped weight to the library. We can go with this approach straight away. Another good news is that @brendankenny is now actively working on the library after a long hiatus. Earcut can in theory be buggy on some degenerate polygons, but I haven't noticed any artifacts so far (edit: later I did).

One way we could make this theoretically even faster (while removing libtess dependency): a) normalize winding order on VT side https://github.com/mapbox/mapnik-vector-tile/issues/59 to be able to tell holes from outer rings apart; b) apply a special algorithm to holed polygons to convert them to simple ones (connecting rings) c) earcut all polygons. This will work for our use case because relative number of holed polygons is low, and generally we don't need to handle shapes with a lot of vertices due to VT simplification and nature of the data (so algorithm complexity isn't that important).

I suggest testing out the earcut+libtess approach and going with it if there are no rendering problems encountered (I'll need some help with this), and saving the last described approach for later when we feel like optimizing some more.

@jfirebaugh @tmcw @ansis @kkaefer thoughts?

mourner commented 9 years ago

OK found some earcut artifacts randomly browsing through the map — while I may attempt fixing this on the earcut side, probably the only reliable way to ensure no rendering artifacts when using earcut will be implementing VT polygon cleanup https://github.com/mapbox/mapnik-vector-tile/issues/53.

kkaefer commented 9 years ago

Very interesting research, Vlad!

While it's tempting, I'd like to refrain from adding more pre-processing and input cleaning on the VT side (we should do that anyway, but not for this reason), because we should be able to support user data that isn't 100% clean, e.g. from GeoJSON files.

mourner commented 9 years ago

@kkaefer I'm afraid we'll have to. At least for now, because supporting bad data for tessellation always comes at a huge performance cost. We can still benefit highly from VT processing by branching depending whether its VT or user data — the first will use the fast tessellation route, while the second will be processed slower while handling degeneracies, but come in significantly lower volume so it won't matter as much.

mourner commented 9 years ago

I'm thinking about this again — even if the original GeoJSON data passed to Mapbox GL JS is good, self-intersections and other artifacts may appear easily after simplification step with geojson-vt.

FIST triangulation paper, which is based on ear slicing triangulation, also describes making it more fool-proof. It involves "desperate mode" — applying different strategies when the straight algorithm fails, like splitting the ring into two rings etc. They may be expensive, but will only happen for bad polygons. At the time it sounded like too much work and my brain was already too tired of triangulation problems, but now that I'm thinking fresh, it seems that with some tinkering, we could come up with our own modified, simplified algorithm that would work on 99.99% of practical vector data and fallback more gracefully on the rest (failures).

ljbade commented 9 years ago

@mourner sounds like an awesome algo, will this be ported to c++ after?

also does you algo create lots of little thin triagles like some algos do, or does it generate nice large triangles.

mourner commented 9 years ago

@ljbade I tried tons of algorithms, what algorithm do you refer to specifically?

ljbade commented 9 years ago

@mourner I can't remember exactly but one of them produces lots of little slices, i think it fans out from single vertex a lot of the time

mourner commented 9 years ago

@ljbade in this case, we have to focus on triangulation speed, which leads to algorithms that produce lots of thin triangles. That are some ways to improve ear slicing, for example, to produce better triangulation, but it comes with a CPU cost. First we'll get to an acceptable level of triangulation performance and robustness, and only then look whether reducing thin triangles is worth it performance-wise.

ljbade commented 9 years ago

Oh ok, I wonder what algo this uses -> https://code.google.com/p/poly2tri/

I remember it being rather fast when I looked at it years ago, but I suppose your algo is even faster

I just remembered reading that degenerate triangles are bad for GPUs since it has to reprocess the same pixels multiple times