Closed behdad closed 7 years ago
Thomas Sederberg, in "Computer Aided Geometric Design", describes a so-called "Horner's algorithm" for evaluating Bezier curves more efficiently than the De Casteljau algorithm, or the lerp
approach currently used in cu2qu.
One first has to convert the coefficients of the Bernstein-Bezier polynomial to power (or monomial) form. Then one can evaluate a power basis polynomial of degree n using only n multiplies and adds. This is described by the author as
the fastest algorithm for evaluating a power basis polynomial at a single point.
Sederberg also describes a variant of the same Horner algorithm which works directly on the Bernstein-Bezier polynomial, and hence doesn't require to convert coefficients to power basis.
In my tests, both Horner's schemes are between 2 to 3 times faster than De Casteljau's. I don't have code to show right now, but I might try to apply Horner's scheme to cu2qu and see if it speeds things up.
Of course, an alternative would be to have a C extension module do the expensive job. That would not be so bad. I recently discovered CFFI, and I find it relatively easier than using the CPython API.
BTW, Sederberg's book is available here for free: http://tom.cs.byu.edu/~557/text/cagd.pdf
BTW, Sederberg's book is available here for free: http://tom.cs.byu.edu/~557/text/cagd.pdf
Thanks for the link! I enjoyed browsing it during my flight.
My understanding, from trying to optimize this: https://github.com/rougier/gl-bezier/blob/master/behdad.py
was that it's the constant creation and destruction of objects that degrades performance. Open-coding the De Casteljau's algorithm will avoid that.
If we're sticking with De Casteljau's algorithm for now, what should be opened up? Right now the bezier_at functions each only create a single Point object, and the math is done with primitive number types. Do we want to spell out all the calls to lerp?
(For quick reference: https://github.com/googlei18n/cu2qu/blob/master/Lib/cu2qu/geometry.py#L54)
Well OK, I've been playing around with this and it seems we can actually cut the conversion time in half by doing away with the Point object entirely and just dealing with Python lists. I will pursue this a bit further.
Interesting. Sounds good to me.
Tuples should be even better.
Let's keep this issue open for general speedup chitchat since we all seem to enjoy the conversation :)
James and I talked about this. Currently we are taking ~around 200us per cubic curve, which is insane! Although the uses of this algorithm are offline and can take as long as they want, I think we can and should make it 10 to 100 times faster! I'll sketch a few ideas in further comments.
So, I did expand the lerp's and that gave us another 2x speedup.
In https://github.com/googlei18n/cu2qu/pull/32 I'm introducing a non-sampling error method which means we wouldn't need quadratic_bezier_at
and cubic_bezier_at
anymore. But we should move those to fontTools.misc.bezierTools
anyway, and at that point I'm interested to see if Horner's algorithm can make it any faster.
If anyone (Cosimo?) wants to work on that, I think the following functions will be useful (in each case, add Quadratic as well):
evaluateCubicAtT() # just one t evaluateCubicAtTs() # multiple t's; here converting to power form helps
not sure if we need both. The splitCubicAtT() function takes arbitrary number of t's. So maybe it's good enough to do the same here. Specially, since without a sampling-based error code, the use for evaluating bezier's at T is not that much.
and:
splitCubicAtN() evaluateCubicAtN()
not sure about the name "AtN", but these are similar to AtT, but split / evaluate into N pieces of equal T. For split it's clear what that means. For evaluate, probably should return N-1 points, possibly include the end points as well.
Anyway, the point of the AtN is to speed up t-related calculations by using forward differencing. We might measure and see there's no significant speedup though.
Two things that can possibly be used to speed up lots of bezier-related code, to varying degrees:
This was pretty much done in https://github.com/googlei18n/cu2qu/pull/33
I'm guessing that handcoding cubic_bezier_at() and quadratic_bezier_at() will give a good boost.