`cumprod()` with quaternions crashes

jordens commented 9 months ago

Describe the bug

Calling numpy.cumprod() on an array of quaternions segfaults.

To Reproduce

import numpy as np
import quaternion

np.cumprod(quaternion.as_quat_array(np.zeros((1000, 4), dtype=np.float64)))

Result:

Segmentation fault (core dumped)

Expected behavior Runs without crashing

Environment (please complete the following information):

Ubuntu 23.10
Conda installation Python 3.9.18
Numpy 1.26.3
Quaternion 2023.0.2

Additional context

gdb backtrace

``` Thread 1 "python" received signal SIGSEGV, Segmentation fault. PyErr_Occurred () at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h:14 14 /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h: No such file or directory. ─── Assembly ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ~ ~ ~ ~ 0x00000000004d9500 PyErr_Occurred+0 mov 0x26fa51(%rip),%rax # 0x748f58 <_PyRuntime+568> 0x00000000004d9507 PyErr_Occurred+7 mov 0x58(%rax),%rax 0x00000000004d950b PyErr_Occurred+11 ret ~ ~ ~ ─── Breakpoints ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ─── Expressions ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ─── History ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ─── Memory ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ─── Registers ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── rax 0x0000000000000000 rbx 0x00007ffff495feb0 rcx 0x0000000001499b1f rdx 0x000000000148a100 rsi 0x0000000000000020 rdi 0x0000000000007cc0 rbp 0x0000000000000000 rsp 0x00007fffffffc6e8 r8 0x0000000000000020 r9 0x0000000000000020 r10 0x0000000001499b20 r11 0x00000000000003e7 r12 0x0000000000000000 r13 0x00007ffff3ba5b10 r14 0x00007ffff7e2e4f0 r15 0x0000000000000000 rip 0x00000000004d9507 eflags [ IF RF ] cs 0x00000033 ss 0x0000002b ds 0x00000000 es 0x00000000 fs 0x00000000 gs 0x00000000 fs_base 0x00007ffff7e9a500 gs_base 0x0000000000000000 ─── Source ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Cannot display "pycore_pyerrors.h" ─── Stack ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── [0] from 0x00000000004d9507 in PyErr_Occurred+7 at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h:14 [1] from 0x00007ffff6de8956 in generic_wrapped_legacy_loop [2] from 0x00007ffff6df1773 in PyUFunc_GenericReduction [3] from 0x00000000004ef64b in cfunction_vectorcall_FASTCALL_KEYWORDS+75 at /usr/local/src/conda/python-3.9.18/Objects/methodobject.c:446 [4] from 0x00007ffff6d8412f in PyArray_GenericAccumulateFunction [5] from 0x00007ffff6d1669f in PyArray_CumProd [6] from 0x00007ffff6d63746 in array_cumprod [7] from 0x0000000000507387 in cfunction_call+55 at /usr/local/src/conda/python-3.9.18/Objects/methodobject.c:543 [8] from 0x0000000000505878 in _PyObject_Call+302 at /usr/local/src/conda/python-3.9.18/Objects/call.c:281 [9] from 0x0000000000505878 in PyObject_Call+344 at /usr/local/src/conda/python-3.9.18/Objects/call.c:293 [+] ─── Threads ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── [1] id 2469270 name python from 0x00000000004d9507 in PyErr_Occurred+7 at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h:14 ─── Variables ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── loc tstate = 0x0: Cannot access memory at address 0x0 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── >>> bt #0 PyErr_Occurred () at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h:14 #1 0x00007ffff6de8956 in generic_wrapped_legacy_loop () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so #2 0x00007ffff6df1773 in PyUFunc_GenericReduction () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so #3 0x00000000004ef64b in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x7ffff7537950, args=0x7ffff3ba6958, nargsf=, kwnames=) at /usr/local/src/conda/python-3.9.18/Objects/methodobject.c:446 #4 0x00007ffff6d8412f in PyArray_GenericAccumulateFunction () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so #5 0x00007ffff6d1669f in PyArray_CumProd () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so #6 0x00007ffff6d63746 in array_cumprod () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so #7 0x0000000000507387 in cfunction_call (func=0x7ffff7537b30, args=, kwargs=) at /usr/local/src/conda/python-3.9.18/Objects/methodobject.c:543 #8 0x0000000000505878 in _PyObject_Call (kwargs=, args=0x7ffff7e5b040, callable=0x7ffff7537b30, tstate=0x76fb70) at /usr/local/src/conda/python-3.9.18/Objects/call.c:281 #9 PyObject_Call (callable=0x7ffff7537b30, args=0x7ffff7e5b040, kwargs=) at /usr/local/src/conda/python-3.9.18/Objects/call.c:293 #10 0x00000000004ed746 in do_call_core (kwdict=0x7ffff75b84c0, callargs=0x7ffff7e5b040, func=0x7ffff7537b30, tstate=) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:5097 #11 _PyEval_EvalFrameDefault (tstate=, f=0x7fff74418dd0, throwflag=) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:3582 #12 0x00000000004e6b2a in _PyEval_EvalFrame (throwflag=0, f=0x7fff74418dd0, tstate=0x76fb70) at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_ceval.h:40 #13 _PyEval_EvalCode (tstate=tstate@entry=0x76fb70, _co=, globals=, locals=locals@entry=0x0, args=, argcount=, kwnames=0x7ffff6b87c58, kwargs=0x7fff744d17b0, kwcount=, kwstep=1, defs=0x0, defcount=, kwdefs=0x0, closure=0x0, name=0x7ffff6b87bb0, qualname=0x7ffff6b87bb0) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:4329 #14 0x00000000004f7e54 in _PyFunction_Vectorcall (func=, stack=, nargsf=, kwnames=) at /usr/local/src/conda/python-3.9.18/Objects/call.c:396 #15 0x00000000004e8c61 in _PyObject_VectorcallTstate (kwnames=0x7ffff6b87c40, nargsf=, args=, callable=0x7ffff6bf7b80, tstate=0x76fb70) at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:118 #16 PyObject_Vectorcall (kwnames=0x7ffff6b87c40, nargsf=, args=, callable=0x7ffff6bf7b80) at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:127 #17 call_function (kwnames=0x7ffff6b87c40, oparg=, pp_stack=, tstate=) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:5077 #18 _PyEval_EvalFrameDefault (tstate=, f=0x7fff744d1610, throwflag=) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:3537 #19 0x00000000004e6b2a in _PyEval_EvalFrame (throwflag=0, f=0x7fff744d1610, tstate=0x76fb70) at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_ceval.h:40 #20 _PyEval_EvalCode (tstate=tstate@entry=0x76fb70, _co=, globals=, locals=locals@entry=0x0, args=, argcount=, kwnames=0x0, kwargs=0x7cb0d0, kwcount=, kwstep=1, defs=0x7ffff6b95318, defcount=, kwdefs=0x0, closure=0x0, name=0x7ffff74297f0, qualname=0x7ffff74297f0) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:4329 #21 0x00000000004f7e54 in _PyFunction_Vectorcall (func=, stack=, nargsf=, kwnames=) at /usr/local/src/conda/python-3.9.18/Objects/call.c:396 #22 0x00007ffff6d135e8 in dispatcher_vectorcall () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so #23 0x00000000004ec764 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, args=0x7cb0c8, callable=0x7ffff6baaab0, tstate=0x76fb70) at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:118 #24 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x7cb0c8, callable=0x7ffff6baaab0) at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:127 #25 call_function (kwnames=0x0, oparg=, pp_stack=, tstate=0x76fb70) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:5077 #26 _PyEval_EvalFrameDefault (tstate=, f=0x7caf50, throwflag=) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:3489 #27 0x00000000004e6b2a in _PyEval_EvalFrame (throwflag=0, f=0x7caf50, tstate=0x76fb70) at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_ceval.h:40 #28 _PyEval_EvalCode (tstate=, _co=, globals=, locals=, args=, argcount=argcount@entry=0, kwnames=0x0, kwargs=0x0, kwcount=, kwstep=2, defs=0x0, defcount=, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:4329 #29 0x00000000004e67b7 in _PyEval_EvalCodeWithName (_co=, globals=, locals=, args=, argcount=argcount@entry=0, kwnames=, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:4361 #30 0x00000000004e6769 in PyEval_EvalCodeEx (_co=_co@entry=0x7ffff75bb9d0, globals=globals@entry=0x7ffff75b8280, locals=locals@entry=0x7ffff75b8280, args=args@entry=0x0, argcount=argcount@entry=0, kws=kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:4377 #31 0x000000000059466b in PyEval_EvalCode (co=co@entry=0x7ffff75bb9d0, globals=globals@entry=0x7ffff75b8280, locals=locals@entry=0x7ffff75b8280) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:828 #32 0x00000000005c1dc7 in run_eval_code_obj (tstate=tstate@entry=0x76fb70, co=co@entry=0x7ffff75bb9d0, globals=globals@entry=0x7ffff75b8280, locals=locals@entry=0x7ffff75b8280) at /usr/local/src/conda/python-3.9.18/Python/pythonrun.c:1221 #33 0x00000000005bddd0 in run_mod (mod=mod@entry=0x7eeb78, filename=filename@entry=0x7ffff75b6a30, globals=globals@entry=0x7ffff75b8280, locals=locals@entry=0x7ffff75b8280, flags=flags@entry=0x7fffffffd9d8, arena=arena@entry=0x7ffff7e19950) at /usr/local/src/conda/python-3.9.18/Python/pythonrun.c:1242 #34 0x000000000045674e in pyrun_file (fp=fp@entry=0x76d420, filename=filename@entry=0x7ffff75b6a30, start=start@entry=257, globals=globals@entry=0x7ffff75b8280, locals=locals@entry=0x7ffff75b8280, closeit=closeit@entry=1, flags=0x7fffffffd9d8) at /usr/local/src/conda/python-3.9.18/Python/pythonrun.c:1140 #35 0x00000000005b7ab2 in pyrun_simple_file (flags=0x7fffffffd9d8, closeit=1, filename=0x7ffff75b6a30, fp=0x76d420) at /usr/local/src/conda/python-3.9.18/Python/pythonrun.c:450 #36 PyRun_SimpleFileExFlags (fp=0x76d420, filename=, closeit=1, flags=0x7fffffffd9d8) at /usr/local/src/conda/python-3.9.18/Python/pythonrun.c:483 #37 0x00000000005b502e in pymain_run_file (cf=0x7fffffffd9d8, config=0x76e340) at /usr/local/src/conda/python-3.9.18/Modules/main.c:379 #38 pymain_run_python (exitcode=0x7fffffffd9d0) at /usr/local/src/conda/python-3.9.18/Modules/main.c:604 #39 Py_RunMain () at /usr/local/src/conda/python-3.9.18/Modules/main.c:683 #40 0x0000000000588719 in Py_BytesMain (argc=, argv=) at /usr/local/src/conda/python-3.9.18/Modules/main.c:1129 #41 0x00007ffff7c280d0 in __libc_start_call_main (main=main@entry=0x5886d0

, argc=argc@entry=2, argv=argv@entry=0x7fffffffdc08) at ../sysdeps/nptl/libc_start_call_main.h:58 #42 0x00007ffff7c28189 in __libc_start_main_impl (main=0x5886d0

, argc=2, argv=0x7fffffffdc08, init=, fini=, rtld_fini=, stack_end=0x7fffffffdbf8) at ../csu/libc-start.c:360 #43 0x00000000005885ce in _start () ```

moble commented 9 months ago

Thanks for the bug report. This is pretty weird. I can consistently get the segfault with a variety of combinations of numpy and/or quaternion — but only with python 3.9 or later, whether I fix the numpy and quaternion versions across python versions, or update to the latest in each case. Also, it consistently works with 501 or fewer quaternions, and fails with 502 or more, on both macOS ARM, and Linux x86_64. Also worth noting that this behavior seems to happen with cumsum as well.

It may take some time for me to figure out this bug, and I imagine I'll need some insight from numpy devs.

Meanwhile, a couple remarks. First, it's not entirely obvious to me that cumprod is guaranteed to always respect the ordering of the input array — which won't matter for the usual number types, but will matter for quaternions. I can imagine optimizations that might move across the input array out of order, like vectorization (though specifically vectorization presumably won't happen automatically). Anyway, I'd just be wary of cumsum for that reason; you can roll your own just as easily, and it probably won't make a huge difference in your overall time. Second, I guess you're probably using cumsum to build up a rotation from small rotations. You might be better off reformulating your problem, e.g., as a differential equation.

jordens commented 9 months ago

Thanks for the analysis and the nice library.

I'd be surprised if this is an issue with cumsum/cumprod. They probably hold up their end of the contract around dtypes. My guess is that poking around with GDB (potentially with more debugging symbols and less optimization) or valgrind will point to the issue.

A manual cumsum implementation is orders of magnitude slower than the numpy one. That does matter in my case.

Yes, this is an integration. No, I'm not doing first order. It's an implementation of a high order geometric integrator based the Magnus expansion. That systematically beats the RK of integrate_angular_velocity().

moble commented 9 months ago

I'd be surprised if this is an issue with cumsum/cumprod. They probably hold up their end of the contract around dtypes.

To be clear, I'm not blaming numpy for being buggy, but there is no contract; there's just a minimally documented array finterface that's been going through huge changes these past few years.

A manual cumsum implementation is orders of magnitude slower than the numpy one.

Have you tried it? For small arrays, they're almost identical; for ~1,000 elements and up it's one order of magnitude. If that really matters, you might want to take a look at numba and quaternionic, which avoid many of the performance penalties inherent in numpy. (Of course, you'd have to roll your own there because np.cumsum doesn't work with quaternionic either — apparently because of some raveling that cumsum does somewhere under the covers.)

It's an implementation of a high order geometric integrator based the Magnus expansion.

Ooh! I'd be interested to see it when you're done.

jordens commented 9 months ago

To be clear, I'm not blaming numpy for being buggy, but there is no contract; there's just a minimally documented array finterface that's been going through huge changes these past few years.

Argh.

Have you tried it? For small arrays, they're almost identical; for ~1,000 elements and up it's one order of magnitude. If that really matters, you might want to take a look at numba and quaternionic, which avoid many of the performance penalties inherent in numpy. (Of course, you'd have to roll your own there because np.cumsum doesn't work with quaternionic either — apparently because of some raveling that cumsum does somewhere under the covers.)

True. The manual impl vs np.cumsum is 85 µs vs 8 µs for 500 elements on my machine. I can live with that for now.

It's an implementation of a high order geometric integrator based the Magnus expansion.

Ooh! I'd be interested to see it when you're done.

Will certainly publish once done.

Basically, the core part is this.


def magnus6(w):
    """exponential map for 6th order magnus expansion for integration of `w`

    `q'(t) = q(t)*w(t)`
    `w` (3, n): pure quaternion samples (given as vector so that `np.cross` works)
      of w at the 3 nodes `_magnus6_nodes` in the n sample intervals `i`
      i.e. supply `w[j, i] = h*w(h*(i + magnus6_nodes[j]))`

    returns the optimal pure quaternion g[:n] such that
    q[i + 1] = q[i]*exp(g[i])

    Note the right multiplication for naturally ordered arrays where increasing index
    is increasing time

    Implementation of:
    S. Blanes, F. Casas, and J. Ros. High order optimized geometric integrators
    for linear differential equations. BIT, 42:262–284, 2002.
    """
    a1 = w[1]
    a2 = np.sqrt(5/3)*(w[0] - w[2])
    a3 = 5/3*(w[0] - 2*w[1] + w[2])
    s1 = np.cross(a1, a2)
    s2 = 1/15*np.cross(a1, 2*a3 + s1)
    g = a1 + 1/6*a3 + 1/60*np.cross(-10*a1 - a3 + s1, a2 - s2)
    return g

# nodes
_magnus6_nodes = 1/2 + np.array([-1, 0, 1])*np.sqrt(3/5)/2

def vec2q(p):
    """Convert R3 vector to pure quaternion"""
    shape = p.shape[:-1]
    q = np.empty(shape + (4,), np.float64)
    q[..., 0] = 0.
    q[..., 1:] = p
    return q.view(np.quaternion).reshape(shape)

# integrating

w0 = 1.  # splitting
f0 = 1.  # drive frequency
g0 = .8  # drive strength

# standard qubit drive (angular velocity)
def w(t):
    return np.array([
        g0*np.cos(f0*t),
        g0*np.sin(f0*t),
        w0/2*np.ones_like(t),
    ]).T

# t: time steps
# h: step size
h = 1e-2
wp = np.sqrt((w0 + f0)**2 + 4*g0**2)
t1 = 1000*2*np.pi/wp
t = np.arange(0, t1, h)

np.prod(np.exp(vec2q(magnus6(h*w((t[:, None] + h*(_magnus6_nodes - 1)).ravel()).reshape((t.shape[0], 3, 3)).swapaxes(0, 1)))))

# vs

ode = integrate.ode(lambda t, y: (quaternion.quaternion(*y)*quaternion.quaternion(*w(t))).components)
ode.set_initial_value([1, 0, 0, 0.], 0.)
ode.set_integrator("dop853", ...)

On the toy problem of Blanes et al. it's about two orders of magnitude faster to get to machine precision than dop853. I don't need adaptive step size in my use case and I haven't done any optimization or tweaking.

kohlerjl commented 8 months ago

I've encountered this issue as well, running with Python 3.11.7 and numpy 1.26.3.

Have you been able to make any progress locating the issue?

moble / quaternion

`cumprod()` with quaternions crashes #225