Open wtholliday opened 8 years ago
I did a little more work on this:
https://github.com/wtholliday/nanovg/commit/d71b5a72e730da7e91ca45c2152e20ae460b49a2
(I switched to C++ so I could work a bit faster.) Using a simple scheme of hashing the first path point using a 100x100 grid seemed to work quite well. I'm also clearing out unused paths at the end of the frame.
It seems like it would be better to do the caching on a bigger granularity than bezier segments. It seems there are two options:
nvg__flattenPaths
. A cache hit would result in a memcpy
from the cache to ctx->cache->points
.nvgStroke
and nvgFill
. A cache hit would result in a single call to the rendering backend (ctx->params.renderStroke
or ctx->params.renderFill
).Does anyone have an inclination as to the better route?
I should say, that in general, caching tessellations could be a big win, because even if your hash function isn't very good and there are a few paths per bucket, it's very quick to reject paths. So the overhead for a cache miss is quite low.
@wtholliday Hey, I'm super interested in this. How's it looking? Do you think your latest implementation is ready to use? What kind of performance improvements are you finding?
@robclouth Well, I haven't really tested it adequately (your help is welcome!). After learning more about the nanovg code, I think I would go with option 2 above: cache the results of nvgStroke
and nvgFill
that are sent to the back end, unless someone indicates otherwise.
It's also worth looking at https://github.com/memononen/nanovg/issues/328 in which I've sped up bezier tessellation considerably (also requires more testing). The caching and faster tessellation approaches could be combined, of course.
Great! Totally up for trying these out. I'll report back.
Today, I worked a bit on caching the results of bezier tessellation.
In the tiger example,
nvg__flattenPaths
was consuming 75% of CPU time in my test. After adding a cursory implementation of a tessellation cache, that is down to 57%. (All caching is done withinnvg__flattenPaths
, so this accounts for all overhead.)This seems like a modest gain that could be improved upon with a better hashing scheme. Also, my implementation is very wasteful in terms of memory, just so I could get this done quickly.
Here are the relevant changes:
https://github.com/wtholliday/nanovg/commit/94abcf5d63b250be22f5f3f4bdbed19a98154097
https://github.com/wtholliday/nanovg/commit/e36e00f36f46a4365169e9ccf983b775045175a6
Of course, this particular approach has the downside of cache misses whenever a path is scaled or translated. If tessellation occurred in local coordinates (transforming the tolerance accordingly), then this overhead would be eliminated at least in the case of rotating or translating.
Feedback appreciated!