tuwien-cms / xprec

Full quadruple precision (double-double) data type for numpy
MIT License
12 stars 4 forks source link

Wrong results and failling test #16

Closed MoiseRousseau closed 1 year ago

MoiseRousseau commented 1 year ago

Hi!

I am a HPC analyst at Calcul Quebec and we try to install the xprec library on our system. However, we ran into troubles when running the xprec test suite with 24 failures (see test_error.txt for the full output):

[suer@machine test]$ pytest
============================= test session starts ==============================
platform linux -- Python 3.9.6, pytest-7.2.1, pluggy-1.0.0
rootdir: /home/moroub/xprec
plugins: anyio-3.6.2+computecanada
collected 52 items                                                             

test_dtype.py .......................                                    [ 44%]
test_linalg.py FFFFFFF                                                   [ 57%]
test_mpmath.py FFFF                                                      [ 65%]
test_ufunc.py FFFFF.FFFFFFFF                                             [ 92%]
test_whitespace.py ....                                                  [100%]

=================================== FAILURES ===================================

test_error.txt

I am working with xprec 1.3.7 installed as follow: installation_xprec.txt

After some debugging, I found with GDB that NumPy still use its own library with xprec.ddouble datatype. Adding the -Wl,-Bsymbolic-functions flag to GCC linker make things work (i.e. NumPy call xprec function for xprec.ddouble datatype). Yet, we did not want to use this flag for library in our software stack. Also, on my personal laptop, xprec does not require this flag, so the problem must be elsewhere.

Do you have a idea where does this error could come from ?

Thanks for your time, Moise

mwallerb commented 1 year ago

Hi!

This error seems very strange indeed - I have never run across this problem in the wild. Unfortunately, cluster setups are fiendishly difficult to debug, particularly from afar.

Can you do a pip freeze -- in particular, which numpy version are you using? Can you try with different python/numpy versions? Anaconda?

Also, I really don't understand which symbols this should bind to ... can you try running ldd on the xprec.(...).so files and try to match its symbols using nm? My best guess is that some library exports functions named ****q but that'd be quite strange.

mwallerb commented 1 year ago

Also, the method we use to register the ddouble type is basically identical to the way the tensorflow package registers its minifloat type - can you check if tensorflow works on this machine?

MoiseRousseau commented 1 year ago

I tried with different numpy versions, and I found that xprec tests works with numpy 1.19.5 and below, while it failed for 1.21.0 (not sure about the 1.20 because it is not installed on our systems).

pip freeze for numpy==1.24.2 returns:

attrs==22.2.0+computecanada
exceptiongroup==1.1.0+computecanada
iniconfig==2.0.0+computecanada
numpy==1.24.2+computecanada
packaging==23.0+computecanada
pluggy==1.0.0+computecanada
pytest==7.2.2
tomli==2.0.1+computecanada
xprec==1.3.7

The +computecanada just indicates that the wheel were built by us.

For you other questions, a colleague will help me answer it. I five you some news tomorrow.

Moise

mwallerb commented 1 year ago

Interestingly, one test that does not fail in the whole suite is the one for hypot, which might be because hypotqq( ) is defined static inline. Quick googling reveals that sqrtq etc. are defined in libquadmath, shipped with gcc. This is just a theory, but maybe the extension for some reason links to the libquadmath functions instead of the xprec ones?

MoiseRousseau commented 1 year ago

Indeed, I think this might be the case, as the backtrace of GDB shows it is using libquadmath:

[user@machine test]$ gdb python
GNU gdb (Gentoo 9.1 vanilla) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
(gdb) run moise.py
Starting program: /home/user/xprec/env_bad/bin/python moise.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/cvmfs/soft.computecanada.ca/gentoo/2020/lib64/libthread_db.so.1".
^C
Program received signal SIGINT, Interrupt.
0x00007ffff607fe42 in ?? ()
   from /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/libquadmath.so.0
(gdb) bt
#0  0x00007ffff607fe42 in ?? ()
   from /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/libquadmath.so.0
#1  0x00007ffff6068d36 in sqrtq ()
   from /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/libquadmath.so.0
#2  0x00007fffed55a1e1 in u_sqrtq (args=<optimized out>, 
    dimensions=<optimized out>, steps=<optimized out>, data=<optimized out>)
    at csrc/_dd_ufunc.c:833
#3  0x00007ffff6a1b815 in generic_wrapped_legacy_loop (
    __NPY_UNUSED_TAGGEDcontext=<optimized out>, data=<optimized out>, 
    dimensions=<optimized out>, strides=<optimized out>, 
    auxdata=0x7fffedd96db0) at numpy/core/src/umath/legacy_array_method.c:87
#4  0x00007ffff6a1dc78 in try_trivial_single_output_loop (
    context=context@entry=0x7fffffff8c10, op=op@entry=0x7fffffff8610, 
    order=order@entry=NPY_KEEPORDER, arr_prep=arr_prep@entry=0x7fffffff8910, 
    full_args=..., errormask=521, extobj=0x0)
    at numpy/core/src/umath/ufunc_object.c:1368
#5  0x00007ffff6a25acf in PyUFunc_GenericFunctionInternal (
    wheremask=<optimized out>, full_args=..., 
    output_array_prepare=0x7fffffff8910, order=NPY_KEEPORDER, 
    casting=NPY_SAME_KIND_CASTING, extobj=0x0, op=0x7fffffff8610, 
    operation_descrs=0x7fffffff8810, ufuncimpl=<optimized out>, 
--Type <RET> for more, q to quit, c to continue without paging--c
    ufunc=<optimized out>) at numpy/core/src/umath/ufunc_object.c:2687
#6  ufunc_generic_fastcall (ufunc=<optimized out>, args=<optimized out>, len_args=<optimized out>, kwnames=<optimized out>, outer=<optimized out>) at numpy/core/src/umath/ufunc_object.c:4989
#7  0x00007ffff7dbf807 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x45c390, callable=0x7fffee01ba40, tstate=0x405780) at ./Include/cpython/abstract.h:127
#8  PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x45c390, callable=0x7fffee01ba40) at ./Include/cpython/abstract.h:127
#9  call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x405780) at Python/ceval.c:5072
#10 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3487
#11 0x00007ffff7d911f9 in _PyEval_EvalFrame (throwflag=0, f=0x45c210, tstate=0x405780) at ./Include/internal/pycore_ceval.h:40
#12 _PyEval_EvalCode (tstate=0x405780, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=<optimized out>, kwcount=<optimized out>, kwstep=<optimized out>, defs=<optimized out>, defcount=<optimized out>, kwdefs=<optimized out>, closure=<optimized out>, name=<optimized out>, qualname=<optimized out>) at Python/ceval.c:4327
#13 0x00007ffff7d91fc1 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=0x7ffff6e4bf40, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4359
#14 0x00007ffff7d92009 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4375
#15 0x00007ffff7e2d24b in PyEval_EvalCode (co=co@entry=0x7ffff6de8870, globals=globals@entry=0x7ffff6e4bf40, locals=locals@entry=0x7ffff6e4bf40) at Python/ceval.c:826
#16 0x00007ffff7e2d2ed in run_eval_code_obj (tstate=0x405780, co=0x7ffff6de8870, globals=0x7ffff6e4bf40, locals=0x7ffff6e4bf40) at Python/pythonrun.c:1219
#17 0x00007ffff7e5606b in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7ffff6e4bf40, locals=0x7ffff6e4bf40, flags=<optimized out>, arena=<optimized out>) at Python/pythonrun.c:1240
#18 0x00007ffff7d129ab in pyrun_file (fp=fp@entry=0x406210, filename=filename@entry=0x7ffff6ca0c90, start=start@entry=257, globals=globals@entry=0x7ffff6e4bf40, locals=locals@entry=0x7ffff6e4bf40, closeit=closeit@entry=1, flags=0x7fffffffa188) at Python/pythonrun.c:1138
#19 0x00007ffff7d12e13 in pyrun_simple_file (flags=0x7fffffffa188, closeit=1, filename=0x7ffff6ca0c90, fp=0x406210) at Python/pythonrun.c:449
#20 PyRun_SimpleFileExFlags (fp=fp@entry=0x406210, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffa188) at Python/pythonrun.c:482
#21 0x00007ffff7d13417 in PyRun_AnyFileExFlags (fp=fp@entry=0x406210, filename=0x7ffff6ca0c90 "\004", closeit=closeit@entry=1, flags=flags@entry=0x7fffffffa188) at Python/pythonrun.c:91
#22 0x00007ffff7e56905 in pymain_run_file (cf=0x7fffffffa188, config=0x407a80) at Modules/main.c:373
#23 pymain_run_python (exitcode=0x7fffffffa180) at Modules/main.c:598
#24 Py_RunMain () at Modules/main.c:677
#25 0x00007ffff7e59c09 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#26 0x00007ffff788de1b in __libc_start_main (main=0x400670 <main>, argc=2, argv=0x7fffffffa358, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffa348) at ../csu/libc-start.c:308
#27 0x00000000004006aa in _start ()

with moise.py:

import xprec
import numpy as np

x = np.geomspace(1e-1, 10, 10000000)
#print(np.sqrt(x))
print(np.sqrt(x.astype(xprec.ddouble)))
mwallerb commented 1 year ago

Ah! Can you try the current master?

MoiseRousseau commented 1 year ago

All good with current master and NumPy 1.24.2:

[user@machine xprec]$ git checkout mainline
Switched to branch 'mainline'
Your branch is up to date with 'origin/mainline'.
[user@machine xprec]$ git fetch
remote: Enumerating objects: 18, done.
remote: Counting objects: 100% (18/18), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 18 (delta 13), reused 18 (delta 13), pack-reused 0
Unpacking objects: 100% (18/18), 6.21 KiB | 636.00 KiB/s, done.
From https://github.com/tuwien-cms/xprec
   f63e06c..9eb44f1  mainline   -> origin/mainline
[user@machine xprec]$ git pull
Updating f63e06c..9eb44f1
Fast-forward
 csrc/_dd_linalg.c |  36 ++---
 csrc/_dd_ufunc.c  | 320 ++++++++++++++++++++++----------------------
 csrc/dd_arith.c   | 388 +++++++++++++++++++++++++++---------------------------
 csrc/dd_arith.h   | 254 +++++++++++++++++------------------
 csrc/dd_linalg.c  | 128 +++++++++---------
 csrc/dd_linalg.h  |  10 +-
 6 files changed, 568 insertions(+), 568 deletions(-)
[user@machine xprec]$ source env_bad/bin/activate
(env_bad) [user@machine xprec]$ ls
csrc     env_good     pysrc           README.md  test
env_bad  LICENSE.txt  QD-LICENSE.txt  setup.py
(env_bad) [user@machine xprec]$ pip install .
Ignoring pip: markers 'python_version < "3"' don't match your environment
Looking in links: /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/avx512, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/avx2, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic
Processing /home/user/xprec
Requirement already satisfied: numpy>=1.16 in ./env_bad/lib/python3.9/site-packages (from xprec==1.3.7) (1.24.2+computecanada)
Building wheels for collected packages: xprec
  Building wheel for xprec (setup.py) ... done
  Created wheel for xprec: filename=xprec-1.3.7-cp39-cp39-linux_x86_64.whl size=340986 sha256=3204c9f3c792fefb6eb12da0e6d91e7a9bf9342ce5a372c6c728430c4d6ce38b
  Stored in directory: /tmp/pip-ephem-wheel-cache-k470r9xp/wheels/d8/10/bc/62d4801c0cbce15e62ee1e0471580d3212e438bfe65c370ed8
Successfully built xprec
Installing collected packages: xprec
  Attempting uninstall: xprec
    Found existing installation: xprec 1.3.7
    Uninstalling xprec-1.3.7:
      Successfully uninstalled xprec-1.3.7
Successfully installed xprec-1.3.7
(env_bad) [user@machine xprec]$ cd test
(env_bad) [user@machine test]$ ls
moise.py     test_dtype.py   test_mpmath.py  test_whitespace.py
__pycache__  test_linalg.py  test_ufunc.py
(env_bad) [user@machine test]$ pytest
============================= test session starts ==============================
platform linux -- Python 3.9.6, pytest-7.2.2, pluggy-1.0.0
rootdir: /home/user/xprec
collected 48 items / 1 skipped                                                 

test_dtype.py .......................                                    [ 47%]
test_linalg.py .......                                                   [ 62%]
test_ufunc.py ..............                                             [ 91%]
test_whitespace.py ....                                                  [100%]

======================== 48 passed, 1 skipped in 14.47s ========================

Can you create a new release on PyPi or Github with these modifications so we can add xprec in our wheelhouse ?

mwallerb commented 1 year ago

Done! Thanks for reporting this!