Speed up conversion - Githubissues

behdad commented 8 years ago

I'm guessing that handcoding cubic_bezier_at() and quadratic_bezier_at() will give a good boost.

anthrotype commented 8 years ago

Thomas Sederberg, in "Computer Aided Geometric Design", describes a so-called "Horner's algorithm" for evaluating Bezier curves more efficiently than the De Casteljau algorithm, or the lerp approach currently used in cu2qu.

One first has to convert the coefficients of the Bernstein-Bezier polynomial to power (or monomial) form. Then one can evaluate a power basis polynomial of degree n using only n multiplies and adds. This is described by the author as

the fastest algorithm for evaluating a power basis polynomial at a single point.

Sederberg also describes a variant of the same Horner algorithm which works directly on the Bernstein-Bezier polynomial, and hence doesn't require to convert coefficients to power basis.

In my tests, both Horner's schemes are between 2 to 3 times faster than De Casteljau's. I don't have code to show right now, but I might try to apply Horner's scheme to cu2qu and see if it speeds things up.

Of course, an alternative would be to have a C extension module do the expensive job. That would not be so bad. I recently discovered CFFI, and I find it relatively easier than using the CPython API.

anthrotype commented 8 years ago

BTW, Sederberg's book is available here for free: http://tom.cs.byu.edu/~557/text/cagd.pdf

behdad commented 8 years ago

BTW, Sederberg's book is available here for free: http://tom.cs.byu.edu/~557/text/cagd.pdf

Thanks for the link! I enjoyed browsing it during my flight.

behdad commented 8 years ago

My understanding, from trying to optimize this: https://github.com/rougier/gl-bezier/blob/master/behdad.py

was that it's the constant creation and destruction of objects that degrades performance. Open-coding the De Casteljau's algorithm will avoid that.

jamesgk commented 8 years ago

If we're sticking with De Casteljau's algorithm for now, what should be opened up? Right now the bezier_at functions each only create a single Point object, and the math is done with primitive number types. Do we want to spell out all the calls to lerp?

(For quick reference: https://github.com/googlei18n/cu2qu/blob/master/Lib/cu2qu/geometry.py#L54)

jamesgk commented 8 years ago

Well OK, I've been playing around with this and it seems we can actually cut the conversion time in half by doing away with the Point object entirely and just dealing with Python lists. I will pursue this a bit further.

behdad commented 8 years ago

Interesting. Sounds good to me.

behdad commented 8 years ago

Tuples should be even better.

behdad commented 8 years ago

Let's keep this issue open for general speedup chitchat since we all seem to enjoy the conversation :)

behdad commented 8 years ago

James and I talked about this. Currently we are taking ~around 200us per cubic curve, which is insane! Although the uses of this algorithm are offline and can take as long as they want, I think we can and should make it 10 to 100 times faster! I'll sketch a few ideas in further comments.

behdad commented 8 years ago

So, I did expand the lerp's and that gave us another 2x speedup.

In https://github.com/googlei18n/cu2qu/pull/32 I'm introducing a non-sampling error method which means we wouldn't need quadratic_bezier_at and cubic_bezier_at anymore. But we should move those to fontTools.misc.bezierTools anyway, and at that point I'm interested to see if Horner's algorithm can make it any faster.

If anyone (Cosimo?) wants to work on that, I think the following functions will be useful (in each case, add Quadratic as well):

evaluateCubicAtT() # just one t evaluateCubicAtTs() # multiple t's; here converting to power form helps

not sure if we need both. The splitCubicAtT() function takes arbitrary number of t's. So maybe it's good enough to do the same here. Specially, since without a sampling-based error code, the use for evaluating bezier's at T is not that much.

and:

splitCubicAtN() evaluateCubicAtN()

not sure about the name "AtN", but these are similar to AtT, but split / evaluate into N pieces of equal T. For split it's clear what that means. For evaluate, probably should return N-1 points, possibly include the end points as well.

Anyway, the point of the AtN is to speed up t-related calculations by using forward differencing. We might measure and see there's no significant speedup though.

behdad commented 8 years ago

Two things that can possibly be used to speed up lots of bezier-related code, to varying degrees:

Using Cython to convert tight functions into C code. cu2qu has some that can benefit from this; we can start testing here. If things look promising, we might want to define a C-array-based pen protocol at some point,
This is going to be controversial, but can easily speed things up 3x or more: use complex() numbers to represent points, instead of tuples. They can be added, subtracted, multiplied by a scalar, and get length of (abs()) natively, and we don't need to repeat code for x and y and we don't need to create tuples all the time... WDYT?

behdad commented 7 years ago

This was pretty much done in https://github.com/googlei18n/cu2qu/pull/33

googlefonts / cu2qu

Speed up conversion #9