Open jordens opened 9 months ago
Thanks for the bug report. This is pretty weird. I can consistently get the segfault with a variety of combinations of numpy and/or quaternion — but only with python 3.9 or later, whether I fix the numpy and quaternion versions across python versions, or update to the latest in each case. Also, it consistently works with 501 or fewer quaternions, and fails with 502 or more, on both macOS ARM, and Linux x86_64. Also worth noting that this behavior seems to happen with cumsum
as well.
It may take some time for me to figure out this bug, and I imagine I'll need some insight from numpy devs.
Meanwhile, a couple remarks. First, it's not entirely obvious to me that cumprod
is guaranteed to always respect the ordering of the input array — which won't matter for the usual number types, but will matter for quaternions. I can imagine optimizations that might move across the input array out of order, like vectorization (though specifically vectorization presumably won't happen automatically). Anyway, I'd just be wary of cumsum
for that reason; you can roll your own just as easily, and it probably won't make a huge difference in your overall time. Second, I guess you're probably using cumsum
to build up a rotation from small rotations. You might be better off reformulating your problem, e.g., as a differential equation.
Thanks for the analysis and the nice library.
I'd be surprised if this is an issue with cumsum
/cumprod
. They probably hold up their end of the contract around dtypes. My guess is that poking around with GDB (potentially with more debugging symbols and less optimization) or valgrind will point to the issue.
A manual cumsum
implementation is orders of magnitude slower than the numpy one. That does matter in my case.
Yes, this is an integration. No, I'm not doing first order. It's an implementation of a high order geometric integrator based the Magnus expansion. That systematically beats the RK of integrate_angular_velocity()
.
I'd be surprised if this is an issue with
cumsum
/cumprod
. They probably hold up their end of the contract around dtypes.
To be clear, I'm not blaming numpy for being buggy, but there is no contract; there's just a minimally documented array finterface that's been going through huge changes these past few years.
A manual
cumsum
implementation is orders of magnitude slower than the numpy one.
Have you tried it? For small arrays, they're almost identical; for ~1,000 elements and up it's one order of magnitude. If that really matters, you might want to take a look at numba and quaternionic, which avoid many of the performance penalties inherent in numpy. (Of course, you'd have to roll your own there because np.cumsum
doesn't work with quaternionic
either — apparently because of some raveling that cumsum
does somewhere under the covers.)
It's an implementation of a high order geometric integrator based the Magnus expansion.
Ooh! I'd be interested to see it when you're done.
To be clear, I'm not blaming numpy for being buggy, but there is no contract; there's just a minimally documented array finterface that's been going through huge changes these past few years.
Argh.
Have you tried it? For small arrays, they're almost identical; for ~1,000 elements and up it's one order of magnitude. If that really matters, you might want to take a look at numba and quaternionic, which avoid many of the performance penalties inherent in numpy. (Of course, you'd have to roll your own there because
np.cumsum
doesn't work withquaternionic
either — apparently because of some raveling thatcumsum
does somewhere under the covers.)
True. The manual impl vs np.cumsum
is 85 µs vs 8 µs for 500 elements on my machine. I can live with that for now.
It's an implementation of a high order geometric integrator based the Magnus expansion.
Ooh! I'd be interested to see it when you're done.
Will certainly publish once done.
Basically, the core part is this.
def magnus6(w):
"""exponential map for 6th order magnus expansion for integration of `w`
`q'(t) = q(t)*w(t)`
`w` (3, n): pure quaternion samples (given as vector so that `np.cross` works)
of w at the 3 nodes `_magnus6_nodes` in the n sample intervals `i`
i.e. supply `w[j, i] = h*w(h*(i + magnus6_nodes[j]))`
returns the optimal pure quaternion g[:n] such that
q[i + 1] = q[i]*exp(g[i])
Note the right multiplication for naturally ordered arrays where increasing index
is increasing time
Implementation of:
S. Blanes, F. Casas, and J. Ros. High order optimized geometric integrators
for linear differential equations. BIT, 42:262–284, 2002.
"""
a1 = w[1]
a2 = np.sqrt(5/3)*(w[0] - w[2])
a3 = 5/3*(w[0] - 2*w[1] + w[2])
s1 = np.cross(a1, a2)
s2 = 1/15*np.cross(a1, 2*a3 + s1)
g = a1 + 1/6*a3 + 1/60*np.cross(-10*a1 - a3 + s1, a2 - s2)
return g
# nodes
_magnus6_nodes = 1/2 + np.array([-1, 0, 1])*np.sqrt(3/5)/2
def vec2q(p):
"""Convert R3 vector to pure quaternion"""
shape = p.shape[:-1]
q = np.empty(shape + (4,), np.float64)
q[..., 0] = 0.
q[..., 1:] = p
return q.view(np.quaternion).reshape(shape)
# integrating
w0 = 1. # splitting
f0 = 1. # drive frequency
g0 = .8 # drive strength
# standard qubit drive (angular velocity)
def w(t):
return np.array([
g0*np.cos(f0*t),
g0*np.sin(f0*t),
w0/2*np.ones_like(t),
]).T
# t: time steps
# h: step size
h = 1e-2
wp = np.sqrt((w0 + f0)**2 + 4*g0**2)
t1 = 1000*2*np.pi/wp
t = np.arange(0, t1, h)
np.prod(np.exp(vec2q(magnus6(h*w((t[:, None] + h*(_magnus6_nodes - 1)).ravel()).reshape((t.shape[0], 3, 3)).swapaxes(0, 1)))))
# vs
ode = integrate.ode(lambda t, y: (quaternion.quaternion(*y)*quaternion.quaternion(*w(t))).components)
ode.set_initial_value([1, 0, 0, 0.], 0.)
ode.set_integrator("dop853", ...)
On the toy problem of Blanes et al. it's about two orders of magnitude faster to get to machine precision than dop853. I don't need adaptive step size in my use case and I haven't done any optimization or tweaking.
I've encountered this issue as well, running with Python 3.11.7 and numpy 1.26.3.
Have you been able to make any progress locating the issue?
Describe the bug
Calling
numpy.cumprod()
on an array of quaternions segfaults.To Reproduce
Result:
Expected behavior Runs without crashing
Environment (please complete the following information):
Additional context
gdb backtrace
``` Thread 1 "python" received signal SIGSEGV, Segmentation fault. PyErr_Occurred () at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h:14 14 /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h: No such file or directory. ─── Assembly ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ~ ~ ~ ~ 0x00000000004d9500 PyErr_Occurred+0 mov 0x26fa51(%rip),%rax # 0x748f58 <_PyRuntime+568> 0x00000000004d9507 PyErr_Occurred+7 mov 0x58(%rax),%rax 0x00000000004d950b PyErr_Occurred+11 ret ~ ~ ~ ─── Breakpoints ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ─── Expressions ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ─── History ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ─── Memory ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ─── Registers ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── rax 0x0000000000000000 rbx 0x00007ffff495feb0 rcx 0x0000000001499b1f rdx 0x000000000148a100 rsi 0x0000000000000020 rdi 0x0000000000007cc0 rbp 0x0000000000000000 rsp 0x00007fffffffc6e8 r8 0x0000000000000020 r9 0x0000000000000020 r10 0x0000000001499b20 r11 0x00000000000003e7 r12 0x0000000000000000 r13 0x00007ffff3ba5b10 r14 0x00007ffff7e2e4f0 r15 0x0000000000000000 rip 0x00000000004d9507 eflags [ IF RF ] cs 0x00000033 ss 0x0000002b ds 0x00000000 es 0x00000000 fs 0x00000000 gs 0x00000000 fs_base 0x00007ffff7e9a500 gs_base 0x0000000000000000 ─── Source ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Cannot display "pycore_pyerrors.h" ─── Stack ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── [0] from 0x00000000004d9507 in PyErr_Occurred+7 at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h:14 [1] from 0x00007ffff6de8956 in generic_wrapped_legacy_loop [2] from 0x00007ffff6df1773 in PyUFunc_GenericReduction [3] from 0x00000000004ef64b in cfunction_vectorcall_FASTCALL_KEYWORDS+75 at /usr/local/src/conda/python-3.9.18/Objects/methodobject.c:446 [4] from 0x00007ffff6d8412f in PyArray_GenericAccumulateFunction [5] from 0x00007ffff6d1669f in PyArray_CumProd [6] from 0x00007ffff6d63746 in array_cumprod [7] from 0x0000000000507387 in cfunction_call+55 at /usr/local/src/conda/python-3.9.18/Objects/methodobject.c:543 [8] from 0x0000000000505878 in _PyObject_Call+302 at /usr/local/src/conda/python-3.9.18/Objects/call.c:281 [9] from 0x0000000000505878 in PyObject_Call+344 at /usr/local/src/conda/python-3.9.18/Objects/call.c:293 [+] ─── Threads ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── [1] id 2469270 name python from 0x00000000004d9507 in PyErr_Occurred+7 at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h:14 ─── Variables ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── loc tstate = 0x0: Cannot access memory at address 0x0 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── >>> bt #0 PyErr_Occurred () at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h:14 #1 0x00007ffff6de8956 in generic_wrapped_legacy_loop () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so #2 0x00007ffff6df1773 in PyUFunc_GenericReduction () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so #3 0x00000000004ef64b in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x7ffff7537950, args=0x7ffff3ba6958, nargsf=