serge-sans-paille / pythran

Ahead of Time compiler for numeric kernels
https://pythran.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2k stars 193 forks source link

Performance hit with Honer's algorithm #2219

Closed Dapid closed 4 months ago

Dapid commented 4 months ago

Consider this code to evaluate polynomials (and make pretty fractals):

import sys

import numpy as np
import matplotlib.pyplot as plt

import tqdm

accelerate = '--aot' in sys.argv

if accelerate:
    from extern import newton_step

else:
    def eval_poly(coeffs, x):
        # Taken from numpy.polynomial.polynomial._val
        c0 = coeffs[-1] + x*0
        for i in range(2, len(coeffs) + 1):
            c0 = coeffs[-i] + c0*x
        return c0

    def newton_step(z, poly_coeffs, der_coeffs):
        return z - eval_poly(poly_coeffs, z) / eval_poly(der_coeffs, z)

N = 1024
iter = 4

extent = 1.5
real = np.linspace(-extent, extent, N)
imag = np.linspace(-extent, extent, N)

xv, yv = np.meshgrid(real, imag)
z = xv + 1j * yv

roots = [1, np.exp(2j * np.pi / 3), np.exp(-2j * np.pi / 3)]

polynomial = np.polynomial.Polynomial.fromroots(roots)
deriv = polynomial.deriv()

poly_coeffs = polynomial.coef
der_coeffs = deriv.coef

for it in tqdm.trange(iter):
    z = newton_step(z, poly_coeffs, der_coeffs)

# -- This is just to make it pretty
frame = np.zeros((N, N, 3), dtype=np.float64)

for i, root in enumerate(roots):
    dist = np.abs(z - root)
    frame[:, :, i] = 1/(dist ** 2 + 1)

frame_norm = frame / np.linalg.norm(frame, axis=-1)[:, :, None]
plt.imshow(frame_norm)
plt.show()

with extern.py:

    c0 = coeffs[-1] + x*0
    for i in range(2, len(coeffs) + 1):
        c0 = coeffs[-i] + c0*x
    return c0

#pythran export newton_step(complex128[:, :], complex128[:], complex128[:])
def newton_step(z, poly_coeffs, der_coeffs):
    return z - eval_poly(poly_coeffs, z) / eval_poly(der_coeffs, z)

Running in pure Python mode, I get over 7 it/s, but with Pythran I get over 4 s/it. This is with both release and master, Python 3.11, on Fedora 40 (GCC 14.1.1).

I don't know what is going on, but this should be the kind of problems Pythran shines at.

Restricting ourselves to reals we get an even larger slow down, from 17 it/s in pure Python, down to 3.8 s/it with Pythran:

z = xv + yv

roots = [1, np.exp(2j * np.pi / 3), np.exp(-2j * np.pi / 3)]

polynomial = np.polynomial.Polynomial.fromroots(roots)
deriv = polynomial.deriv()

poly_coeffs = np.real(polynomial.coef).copy()
der_coeffs = np.real(deriv.coef).copy()

for it in tqdm.trange(iter):
    z = newton_step(z, poly_coeffs, der_coeffs)

and #pythran export newton_step(float64[:, :], float64[:], float64[:])

serge-sans-paille commented 4 months ago

I can reproduce the issue. Investigating...

serge-sans-paille commented 4 months ago

Can you confirm that passing -O2 to pythran fixes the issue? It looks like the python-config --cflags no longer includes any optimization flag o_O

serge-sans-paille commented 4 months ago

https://github.com/serge-sans-paille/pythran/pull/2222 may do the trick. @Dapid I'll wait for your confirmation to merge.

Dapid commented 4 months ago

It does work, now I have a 4.5x over pure Python, and a 6.5x on the real case.

Thanks!