Closed MoiseRousseau closed 1 year ago
Hi!
This error seems very strange indeed - I have never run across this problem in the wild. Unfortunately, cluster setups are fiendishly difficult to debug, particularly from afar.
Can you do a pip freeze
-- in particular, which numpy version are you using? Can you try with different python/numpy versions? Anaconda?
Also, I really don't understand which symbols this should bind to ... can you try running ldd
on the xprec.(...).so
files and try to match its symbols using nm
? My best guess is that some library exports functions named ****q
but that'd be quite strange.
Also, the method we use to register the ddouble
type is basically identical to the way the tensorflow package registers its minifloat type - can you check if tensorflow works on this machine?
I tried with different numpy versions, and I found that xprec tests works with numpy 1.19.5 and below, while it failed for 1.21.0 (not sure about the 1.20 because it is not installed on our systems).
pip freeze
for numpy==1.24.2
returns:
attrs==22.2.0+computecanada
exceptiongroup==1.1.0+computecanada
iniconfig==2.0.0+computecanada
numpy==1.24.2+computecanada
packaging==23.0+computecanada
pluggy==1.0.0+computecanada
pytest==7.2.2
tomli==2.0.1+computecanada
xprec==1.3.7
The +computecanada
just indicates that the wheel were built by us.
For you other questions, a colleague will help me answer it. I five you some news tomorrow.
Moise
Interestingly, one test that does not fail in the whole suite is the one for hypot
, which might be because hypotqq( )
is defined static inline. Quick googling reveals that sqrtq
etc. are defined in libquadmath, shipped with gcc. This is just a theory, but maybe the extension for some reason links to the libquadmath functions instead of the xprec ones?
Indeed, I think this might be the case, as the backtrace of GDB shows it is using libquadmath:
[user@machine test]$ gdb python
GNU gdb (Gentoo 9.1 vanilla) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
(gdb) run moise.py
Starting program: /home/user/xprec/env_bad/bin/python moise.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/cvmfs/soft.computecanada.ca/gentoo/2020/lib64/libthread_db.so.1".
^C
Program received signal SIGINT, Interrupt.
0x00007ffff607fe42 in ?? ()
from /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/libquadmath.so.0
(gdb) bt
#0 0x00007ffff607fe42 in ?? ()
from /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/libquadmath.so.0
#1 0x00007ffff6068d36 in sqrtq ()
from /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/libquadmath.so.0
#2 0x00007fffed55a1e1 in u_sqrtq (args=<optimized out>,
dimensions=<optimized out>, steps=<optimized out>, data=<optimized out>)
at csrc/_dd_ufunc.c:833
#3 0x00007ffff6a1b815 in generic_wrapped_legacy_loop (
__NPY_UNUSED_TAGGEDcontext=<optimized out>, data=<optimized out>,
dimensions=<optimized out>, strides=<optimized out>,
auxdata=0x7fffedd96db0) at numpy/core/src/umath/legacy_array_method.c:87
#4 0x00007ffff6a1dc78 in try_trivial_single_output_loop (
context=context@entry=0x7fffffff8c10, op=op@entry=0x7fffffff8610,
order=order@entry=NPY_KEEPORDER, arr_prep=arr_prep@entry=0x7fffffff8910,
full_args=..., errormask=521, extobj=0x0)
at numpy/core/src/umath/ufunc_object.c:1368
#5 0x00007ffff6a25acf in PyUFunc_GenericFunctionInternal (
wheremask=<optimized out>, full_args=...,
output_array_prepare=0x7fffffff8910, order=NPY_KEEPORDER,
casting=NPY_SAME_KIND_CASTING, extobj=0x0, op=0x7fffffff8610,
operation_descrs=0x7fffffff8810, ufuncimpl=<optimized out>,
--Type <RET> for more, q to quit, c to continue without paging--c
ufunc=<optimized out>) at numpy/core/src/umath/ufunc_object.c:2687
#6 ufunc_generic_fastcall (ufunc=<optimized out>, args=<optimized out>, len_args=<optimized out>, kwnames=<optimized out>, outer=<optimized out>) at numpy/core/src/umath/ufunc_object.c:4989
#7 0x00007ffff7dbf807 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x45c390, callable=0x7fffee01ba40, tstate=0x405780) at ./Include/cpython/abstract.h:127
#8 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x45c390, callable=0x7fffee01ba40) at ./Include/cpython/abstract.h:127
#9 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x405780) at Python/ceval.c:5072
#10 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
#11 0x00007ffff7d911f9 in _PyEval_EvalFrame (throwflag=0, f=0x45c210, tstate=0x405780) at ./Include/internal/pycore_ceval.h:40
#12 _PyEval_EvalCode (tstate=0x405780, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=<optimized out>, kwcount=<optimized out>, kwstep=<optimized out>, defs=<optimized out>, defcount=<optimized out>, kwdefs=<optimized out>, closure=<optimized out>, name=<optimized out>, qualname=<optimized out>) at Python/ceval.c:4327
#13 0x00007ffff7d91fc1 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=0x7ffff6e4bf40, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4359
#14 0x00007ffff7d92009 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4375
#15 0x00007ffff7e2d24b in PyEval_EvalCode (co=co@entry=0x7ffff6de8870, globals=globals@entry=0x7ffff6e4bf40, locals=locals@entry=0x7ffff6e4bf40) at Python/ceval.c:826
#16 0x00007ffff7e2d2ed in run_eval_code_obj (tstate=0x405780, co=0x7ffff6de8870, globals=0x7ffff6e4bf40, locals=0x7ffff6e4bf40) at Python/pythonrun.c:1219
#17 0x00007ffff7e5606b in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7ffff6e4bf40, locals=0x7ffff6e4bf40, flags=<optimized out>, arena=<optimized out>) at Python/pythonrun.c:1240
#18 0x00007ffff7d129ab in pyrun_file (fp=fp@entry=0x406210, filename=filename@entry=0x7ffff6ca0c90, start=start@entry=257, globals=globals@entry=0x7ffff6e4bf40, locals=locals@entry=0x7ffff6e4bf40, closeit=closeit@entry=1, flags=0x7fffffffa188) at Python/pythonrun.c:1138
#19 0x00007ffff7d12e13 in pyrun_simple_file (flags=0x7fffffffa188, closeit=1, filename=0x7ffff6ca0c90, fp=0x406210) at Python/pythonrun.c:449
#20 PyRun_SimpleFileExFlags (fp=fp@entry=0x406210, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffa188) at Python/pythonrun.c:482
#21 0x00007ffff7d13417 in PyRun_AnyFileExFlags (fp=fp@entry=0x406210, filename=0x7ffff6ca0c90 "\004", closeit=closeit@entry=1, flags=flags@entry=0x7fffffffa188) at Python/pythonrun.c:91
#22 0x00007ffff7e56905 in pymain_run_file (cf=0x7fffffffa188, config=0x407a80) at Modules/main.c:373
#23 pymain_run_python (exitcode=0x7fffffffa180) at Modules/main.c:598
#24 Py_RunMain () at Modules/main.c:677
#25 0x00007ffff7e59c09 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#26 0x00007ffff788de1b in __libc_start_main (main=0x400670 <main>, argc=2, argv=0x7fffffffa358, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffa348) at ../csu/libc-start.c:308
#27 0x00000000004006aa in _start ()
with moise.py
:
import xprec
import numpy as np
x = np.geomspace(1e-1, 10, 10000000)
#print(np.sqrt(x))
print(np.sqrt(x.astype(xprec.ddouble)))
Ah! Can you try the current master?
All good with current master and NumPy 1.24.2:
[user@machine xprec]$ git checkout mainline
Switched to branch 'mainline'
Your branch is up to date with 'origin/mainline'.
[user@machine xprec]$ git fetch
remote: Enumerating objects: 18, done.
remote: Counting objects: 100% (18/18), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 18 (delta 13), reused 18 (delta 13), pack-reused 0
Unpacking objects: 100% (18/18), 6.21 KiB | 636.00 KiB/s, done.
From https://github.com/tuwien-cms/xprec
f63e06c..9eb44f1 mainline -> origin/mainline
[user@machine xprec]$ git pull
Updating f63e06c..9eb44f1
Fast-forward
csrc/_dd_linalg.c | 36 ++---
csrc/_dd_ufunc.c | 320 ++++++++++++++++++++++----------------------
csrc/dd_arith.c | 388 +++++++++++++++++++++++++++---------------------------
csrc/dd_arith.h | 254 +++++++++++++++++------------------
csrc/dd_linalg.c | 128 +++++++++---------
csrc/dd_linalg.h | 10 +-
6 files changed, 568 insertions(+), 568 deletions(-)
[user@machine xprec]$ source env_bad/bin/activate
(env_bad) [user@machine xprec]$ ls
csrc env_good pysrc README.md test
env_bad LICENSE.txt QD-LICENSE.txt setup.py
(env_bad) [user@machine xprec]$ pip install .
Ignoring pip: markers 'python_version < "3"' don't match your environment
Looking in links: /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/avx512, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/avx2, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic
Processing /home/user/xprec
Requirement already satisfied: numpy>=1.16 in ./env_bad/lib/python3.9/site-packages (from xprec==1.3.7) (1.24.2+computecanada)
Building wheels for collected packages: xprec
Building wheel for xprec (setup.py) ... done
Created wheel for xprec: filename=xprec-1.3.7-cp39-cp39-linux_x86_64.whl size=340986 sha256=3204c9f3c792fefb6eb12da0e6d91e7a9bf9342ce5a372c6c728430c4d6ce38b
Stored in directory: /tmp/pip-ephem-wheel-cache-k470r9xp/wheels/d8/10/bc/62d4801c0cbce15e62ee1e0471580d3212e438bfe65c370ed8
Successfully built xprec
Installing collected packages: xprec
Attempting uninstall: xprec
Found existing installation: xprec 1.3.7
Uninstalling xprec-1.3.7:
Successfully uninstalled xprec-1.3.7
Successfully installed xprec-1.3.7
(env_bad) [user@machine xprec]$ cd test
(env_bad) [user@machine test]$ ls
moise.py test_dtype.py test_mpmath.py test_whitespace.py
__pycache__ test_linalg.py test_ufunc.py
(env_bad) [user@machine test]$ pytest
============================= test session starts ==============================
platform linux -- Python 3.9.6, pytest-7.2.2, pluggy-1.0.0
rootdir: /home/user/xprec
collected 48 items / 1 skipped
test_dtype.py ....................... [ 47%]
test_linalg.py ....... [ 62%]
test_ufunc.py .............. [ 91%]
test_whitespace.py .... [100%]
======================== 48 passed, 1 skipped in 14.47s ========================
Can you create a new release on PyPi or Github with these modifications so we can add xprec in our wheelhouse ?
Done! Thanks for reporting this!
Hi!
I am a HPC analyst at Calcul Quebec and we try to install the xprec library on our system. However, we ran into troubles when running the xprec test suite with 24 failures (see test_error.txt for the full output):
test_error.txt
I am working with xprec 1.3.7 installed as follow: installation_xprec.txt
After some debugging, I found with GDB that NumPy still use its own library with xprec.ddouble datatype. Adding the
-Wl,-Bsymbolic-functions
flag to GCC linker make things work (i.e. NumPy call xprec function for xprec.ddouble datatype). Yet, we did not want to use this flag for library in our software stack. Also, on my personal laptop, xprec does not require this flag, so the problem must be elsewhere.Do you have a idea where does this error could come from ?
Thanks for your time, Moise