memononen / nanovg

Antialiased 2D vector drawing library on top of OpenGL for UI and visualizations.
zlib License
5.13k stars 767 forks source link

tessellation caching #280

Open wtholliday opened 8 years ago

wtholliday commented 8 years ago

Today, I worked a bit on caching the results of bezier tessellation.

In the tiger example, nvg__flattenPaths was consuming 75% of CPU time in my test. After adding a cursory implementation of a tessellation cache, that is down to 57%. (All caching is done within nvg__flattenPaths, so this accounts for all overhead.)

This seems like a modest gain that could be improved upon with a better hashing scheme. Also, my implementation is very wasteful in terms of memory, just so I could get this done quickly.

Here are the relevant changes:

https://github.com/wtholliday/nanovg/commit/94abcf5d63b250be22f5f3f4bdbed19a98154097

https://github.com/wtholliday/nanovg/commit/e36e00f36f46a4365169e9ccf983b775045175a6

Of course, this particular approach has the downside of cache misses whenever a path is scaled or translated. If tessellation occurred in local coordinates (transforming the tolerance accordingly), then this overhead would be eliminated at least in the case of rotating or translating.

Feedback appreciated!

wtholliday commented 7 years ago

I did a little more work on this:

https://github.com/wtholliday/nanovg/commit/d71b5a72e730da7e91ca45c2152e20ae460b49a2

(I switched to C++ so I could work a bit faster.) Using a simple scheme of hashing the first path point using a 100x100 grid seemed to work quite well. I'm also clearing out unused paths at the end of the frame.

It seems like it would be better to do the caching on a bigger granularity than bezier segments. It seems there are two options:

  1. Cache the results of nvg__flattenPaths. A cache hit would result in a memcpy from the cache to ctx->cache->points.
  2. More aggressively: cache the results of both nvgStroke and nvgFill. A cache hit would result in a single call to the rendering backend (ctx->params.renderStroke or ctx->params.renderFill).

Does anyone have an inclination as to the better route?

I should say, that in general, caching tessellations could be a big win, because even if your hash function isn't very good and there are a few paths per bucket, it's very quick to reject paths. So the overhead for a cache miss is quite low.

robclouth commented 7 years ago

@wtholliday Hey, I'm super interested in this. How's it looking? Do you think your latest implementation is ready to use? What kind of performance improvements are you finding?

wtholliday commented 7 years ago

@robclouth Well, I haven't really tested it adequately (your help is welcome!). After learning more about the nanovg code, I think I would go with option 2 above: cache the results of nvgStroke and nvgFill that are sent to the back end, unless someone indicates otherwise.

It's also worth looking at https://github.com/memononen/nanovg/issues/328 in which I've sped up bezier tessellation considerably (also requires more testing). The caching and faster tessellation approaches could be combined, of course.

robclouth commented 7 years ago

Great! Totally up for trying these out. I'll report back.