hoffmangroup / genomedata

The Genomedata format for storing large-scale functional genomics data.
https://genomedata.hoffmanlab.org/
GNU General Public License v2.0
2 stars 1 forks source link

PyTables >= 3.4.1 causes a core dump when reading a continuous section from a supercontig #38

Closed EricR86 closed 7 years ago

EricR86 commented 7 years ago

Original report (archived issue) by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).

The original report had attachments: core.8314.gz


The following message is printed after an update on dependencies: "Illegal instruction (core dumped)"

This error is printed on the latest changeset (3820f6017c6197e6e241413599819c25fca74c9d).

Here is the log that caused the breakage to occur (when it did not previously):

#!bash

$ pip install . --upgrade --user
Processing /mnt/work1/users/home2/eric.roberts/genomedata
Requirement already up-to-date: numpy in /mnt/work1/users/home2/eric.roberts/.local/lib/python2.7/site-packages (from genomedata==1.3.6.dev0)
Requirement already up-to-date: forked-path in /mnt/work1/software/python/2.7/lib/python2.7/site-packages (from genomedata==1.3.6.dev0)
Collecting tables>=3.0 (from genomedata==1.3.6.dev0)
  Using cached tables-3.4.2-cp27-cp27m-manylinux1_x86_64.whl
Requirement already up-to-date: textinput in /mnt/work1/users/home2/eric.roberts/.local/lib/python2.7/site-packages (from genomedata==1.3.6.dev0)
Collecting numexpr>=2.5.2 (from tables>=3.0->genomedata==1.3.6.dev0)
  Using cached numexpr-2.6.2-cp27-cp27m-manylinux1_x86_64.whl
Requirement already up-to-date: six>=1.9.0 in /mnt/work1/software/python/2.7/lib/python2.7/site-packages (from tables>=3.0->genomedata==1.3.6.dev0)
Installing collected packages: numexpr, tables, genomedata
  Found existing installation: tables 3.3.0
    Uninstalling tables-3.3.0:
      Successfully uninstalled tables-3.3.0
  Found existing installation: genomedata 1.3.6.dev0
    Uninstalling genomedata-1.3.6.dev0:
      Successfully uninstalled genomedata-1.3.6.dev0
  Running setup.py install for genomedata ... done
Successfully installed genomedata-1.3.6.dev0 numexpr-2.6.2 tables-3.4.2
$ cd test/
$ ./run_tests.py
Illegal instruction (core dumped)

This bug also occurs when trying to read from any valid Genomedata archive:

#!python

from genomedata import Genome
genome = Genome("valid.genomedata")
chr1 = genome["chr1"]
supercontig = chr1.supercontigs[0][0]
continuous = supercontig.continuous
continuous[0]
EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


The wheels are tagged by PEP 425. Notably the cp27-cp76m refers to the ABI (Application Binary Interface) which refers to CPython built with narrow-unicode build. According to the manylinux docs, there are two ways to build CPython that are not ABI compatible: --enable-unicode=ucs2 and --enable-unicode=ucs4. Supposedly cp27-cp27mu is a far more common ABI but it is not being fetched here.

Unfortunately: tables-3.4.2-cp27-cp27mu-manylinux1_x86_64.whl is not a supported wheel on this platform. was reported when attempting a forced install of a different CPython built wheel.

EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


I can confirm that tables-3.1.1-py2.7-linux-x86_64.egg works correctly and it does seem to be that PyTables is the culprit.

tables-3.4.0-cp27-cp27m-manylinux1_x86_64.whl also works correctly. This is the version information:

#!bash

tables.print_versions()
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version:    3.4.0
HDF5 version:        1.8.18
NumPy version:       1.12.1
Numexpr version:     2.6.2 (not using Intel's VML/MKL)
Zlib version:        1.2.7 (in Python interpreter)
LZO version:         2.09 (Feb 04 2015)
BZIP2 version:       1.0.6 (6-Sept-2010)
Blosc version:       1.10.2 (2016-07-30)
Blosc compressors:   blosclz (1.0.5), lz4 (1.7.2), lz4hc (1.7.2), snappy (1.1.1), zlib (1.2.8), zstd (0.7.4)
Blosc filters:       shuffle, bitshuffle
Cython version:      0.24.1
Python version:      2.7.11 (default, Mar  8 2016, 15:09:28)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)]
Platform:            Linux-3.10.0-514.6.1.el7.x86_64-x86_64-with-centos-7.3.1611-Core
Byte-ordering:       little
Detected cores:      8
Default encoding:    ascii
Default FS encoding: UTF-8
Default locale:      (en_US, UTF-8)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

tables-3.4.1-1-cp27-cp27m-manylinux1_x86_64.whl does not work. This is the version information:

#!bash

tables.print_versions()
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version:    3.4.1
HDF5 version:        1.8.18
NumPy version:       1.12.1
Numexpr version:     2.6.2 (not using Intel's VML/MKL)
Zlib version:        1.2.7 (in Python interpreter)
LZO version:         2.09 (Feb 04 2015)
BZIP2 version:       1.0.6 (6-Sept-2010)
Blosc version:       1.11.3 (2017-03-09)
Blosc compressors:   blosclz (1.0.5), lz4 (1.7.5), lz4hc (1.7.5), snappy (1.1.1), zlib (1.2.8), zstd (1.1.3)
Blosc filters:       shuffle, bitshuffle
Cython version:      0.24.1
Python version:      2.7.11 (default, Mar  8 2016, 15:09:28)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)]
Platform:            Linux-3.10.0-514.6.1.el7.x86_64-x86_64-with-centos-7.3.1611-Core
Byte-ordering:       little
Detected cores:      8
Default encoding:    ascii
Default FS encoding: UTF-8
Default locale:      (en_US, UTF-8)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


The current python line that causes the illegal instruction is line 104 in genomedata/_close_data.py inside write_metadata:

#!python

            col = continuous[:, col_index]

continuous is of type tables.earray.EArray

In the test case that breaks the table has the following properties:

#!python

/supercontig_0/continuous (EArray(24950, 3), shuffle, zlib(1)) ''
  atom := Float32Atom(shape=(), dflt=nan)
  maindim := 1
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (10000, 1)
EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Notably all the shared files installed by the PyTables wheel report the same architecture:

#!bash

$ objdump -x *.so.* | grep architecture
architecture: i386:x86-64, flags 0x00000150:
architecture: i386:x86-64, flags 0x00000150:
architecture: i386:x86-64, flags 0x00000150:
architecture: i386:x86-64, flags 0x00000150:
architecture: i386:x86-64, flags 0x00000150:

The only thing of note is that 3.4.1 onwards is missing libblosc from the shared libraries compared to 3.4.0 and previous builds.

EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Here is the top of the stack truncated to 10 from the back trace from the core dump: #0 being the illegal instrution

#!bash

#0  0x00007fcdf2ff94bc in inflate_fast (strm=strm@entry=0x7ffd4cafc4c0, start=start@entry=989)
    at c-blosc/internal-complibs/zlib-1.2.8/inffast.c:251
#1  0x00007fcdf2ff3291 in inflate (strm=0x7ffd4cafc4c0, flush=2)
    at c-blosc/internal-complibs/zlib-1.2.8/inflate.c:1024
#2  0x00007fcdf2cbd55d in H5Z_filter_deflate ()
   from /scratch/arch/Linux-x86_64/opt/python-2.7.11/lib/python2.7/site-packages/tables/.libs/libhdf5-6f436a33.so.10.2.1
#3  0x00007fcdf2cbc7b1 in H5Z_pipeline ()
   from /scratch/arch/Linux-x86_64/opt/python-2.7.11/lib/python2.7/site-packages/tables/.libs/libhdf5-6f436a33.so.10.2.1
#4  0x00007fcdf2af0172 in H5D__chunk_lock ()
   from /scratch/arch/Linux-x86_64/opt/python-2.7.11/lib/python2.7/site-packages/tables/.libs/libhdf5-6f436a33.so.10.2.1
#5  0x00007fcdf2af1518 in H5D__chunk_read ()
   from /scratch/arch/Linux-x86_64/opt/python-2.7.11/lib/python2.7/site-packages/tables/.libs/libhdf5-6f436a33.so.10.2.1
#6  0x00007fcdf2b0469b in H5D__read ()
   from /scratch/arch/Linux-x86_64/opt/python-2.7.11/lib/python2.7/site-packages/tables/.libs/libhdf5-6f436a33.so.10.2.1
#7  0x00007fcdf2b04d0c in H5Dread ()
   from /scratch/arch/Linux-x86_64/opt/python-2.7.11/lib/python2.7/site-packages/tables/.libs/libhdf5-6f436a33.so.10.2.1
#8  0x00007fce04d13a19 in H5ARRAYreadSlice (dataset_id=83886080, type_id=50331743, start=<optimized out>,
    stop=<optimized out>, step=0x1647d60, data=0x17757e0) at src/H5ARRAY.c:544
#9  0x00007fce04ce3f90 in __pyx_pf_6tables_13hdf5extension_5Array_10_g_read_slice (__pyx_v_startl=<optimized out>,
    __pyx_v_stopl=<optimized out>, __pyx_v_stepl=<optimized out>, __pyx_v_nparr=0x7fcdeed8f620,
    __pyx_v_self=0x7fcdeede45a0) at tables/hdf5extension.c:18536
#10 __pyx_pw_6tables_13hdf5extension_5Array_11_g_read_slice (__pyx_v_self=0x7fcdeede45a0, __pyx_args=<optimized out>,
    __pyx_kwds=<optimized out>) at tables/hdf5extension.c:18365

Here is a link to the offending file and version: https://github.com/madler/zlib/blob/v1.2.8/inffast.c

EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Attached a gzipped core dump.

EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


The following script breaks on opening an hdf5 file with a Pytables table:

#!python

#!/usr/bin/env python

# Opens an hdf5 genomedata file for testing
from tables import open_file
import sys

if len(sys.argv[1]) < 3:
    print("Usage: tables_test.py hdf5file hdf5path_to_table")

hdf5_filename = sys.argv[1]
hdf5_path_to_table = sys.argv[2]

# Open the h5file
print("Opening hdf5 file: {}".format(hdf5_filename))
h5file = open_file(hdf5_filename)

print("Getting table from hdf5 path: {}".format(hdf5_path_to_table))
pytable = h5file.get_node(hdf5_path_to_table)
print("PyTables info: {}".format(repr(pytable)))

print("Attempting to index into table at 0...")
print pytable[0]

# Close the h5file
h5file.close()

e.g. in the genomedata test source path:

#!bash

$ python tables_test.py "data/v1.genomedata" "/chr1/supercontig_0/continuous"
Opening hdf5 file: data/v1.genomedata
Getting table from hdf5 path: /chr1/supercontig_0/continuous
PyTables info: /chr1/supercontig_0/continuous (EArray(24950, 3), shuffle, zlib(1)) ''
  atom := Float32Atom(shape=(), dflt=0.0)
  maindim := 1
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (10000, 1)
Attempting to index into table at 0...
Illegal instruction (core dumped)

This error could not be reproduced using the example hdf5 file h5ex_t_array.h5 from the HDF Group:

#!bash

$ python tables_test.py h5ex_t_float.h5 "/DS1"
Opening hdf5 file: h5ex_t_float.h5
Getting table from hdf5 path: /DS1
PyTables info: /DS1 (Array(4, 7)) ''
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := None
Attempting to index into table at 0...
[ 0.  1.  2.  3.  4.  5.  6.]
EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


From the PyTables gitter:

"Tom Kooij @tomkooij 10:57

@EricR86 : Thanks for reporting. The SIGILL (illegal instruction) is caused by the wheels now using the internal BLOSC (hence the missing blosc .so). However, this (erroneously) has triggered the wheels compiling with AVX2 instructions enabled.

I'm currently travelling, but I plan to have fixed wheels up on PyPI by the end of the weekend (may 7th)."

EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Resolved in wheel version tables-3.4.2-3 (onwards)

EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Should tables-3.4.2-3 be the minimum version going forward?

EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Some discussion about moving dependency versions upwards might be necessary

EricR86 commented 7 years ago

Original comment by Michael Hoffman (Bitbucket: hoffman, GitHub: michaelmhoffman).


Can you have a complex dependency? <3.4.1 or >=3.4.2-3?

EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


From: https://setuptools.readthedocs.io/en/latest/pkg_resources.html#requirement-objects

It looks like you can. Something exactly like <3.4.1,>=3.4.2-3

EricR86 commented 7 years ago

Original comment by Michael Hoffman (Bitbucket: hoffman, GitHub: michaelmhoffman).


Please do that.

EricR86 commented 7 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Version updated in changeset 2d7973a2d7e8907f3da3a9a3b34434e34331b519