hoffmangroup / genomedata

The Genomedata format for storing large-scale functional genomics data.
https://genomedata.hoffmanlab.org/
GNU General Public License v2.0
2 stars 1 forks source link

h5repack segfaults (in rare cases of unknown etiology) #32

Closed EricR86 closed 5 years ago

EricR86 commented 7 years ago

Original report (archived issue) by Coby Viner (Bitbucket: cviner2, GitHub: cviner).

The original report had attachments: core.19535.gz


One Genomedata run, out of multiple similar parallel runs, segfaulted, producing a core dump. This has also happened twice on a previous occasion, but I mis-attributed them to system-specific aberrations.

Inspection of the core file produced (and enclosed, gzipped), via file yields:

#!text

core.19535: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'h5repack -f GZIP=1 /mnt/work1/users/hoffmangroup/cviner/temp/genomedata.MfLZzl/', real uid: 1060, effective uid: 1060, real gid: 1037, effective gid: 1037, execfn: '/usr/bin/h5repack', platform: 'x86_64'

/usr/bin/h5repack --version yielded: h5repack: (Version 1.8.7).

Execution of a gdb backtrace on h5repack with this core file, yields the following stack trace:

#!text

Core was generated by `h5repack -f GZIP=1 /mnt/work1/users/hoffmangroup/cviner/temp/genomedata.MfLZzl/'.
Program terminated with signal 11, Segmentation fault.
#0  H5O_dec_rc (oh=0x0) at H5O.c:3416
3416    H5O.c: No such file or directory.
    in H5O.c
Missing separate debuginfos, use: debuginfo-install hdf5-1.8.7-1.el6.rf.x86_64
(gdb) bt
#0  H5O_dec_rc (oh=0x0) at H5O.c:3416
#1  0x0000003cff127ff7 in H5O_chunk_proxy_dest (f=0x267a2a0, dxpl_id=167772168, addr=<value optimized out>, _udata=<value optimized out>) at H5Ocache.c:1443
#2  H5O_cache_chk_load (f=0x267a2a0, dxpl_id=167772168, addr=<value optimized out>, _udata=<value optimized out>) at H5Ocache.c:758
#3  0x0000003cff067ea3 in H5C_load_entry (f=0x267a2a0, primary_dxpl_id=167772168, secondary_dxpl_id=<value optimized out>, type=0x3cff491f60, addr=<value optimized out>, udata=<value optimized out>, flags=512) at H5C.c:7955
#4  H5C_protect (f=0x267a2a0, primary_dxpl_id=167772168, secondary_dxpl_id=<value optimized out>, type=0x3cff491f60, addr=<value optimized out>, udata=<value optimized out>, flags=512) at H5C.c:3563
#5  0x0000003cff04fd56 in H5AC_protect (f=<value optimized out>, dxpl_id=<value optimized out>, type=<value optimized out>, addr=<value optimized out>, udata=<value optimized out>, rw=<value optimized out>) at H5AC.c:1312
#6  0x0000003cff117ab5 in H5O_protect (loc=0x7ffef0830c80, dxpl_id=167772168, prot=H5AC_READ) at H5O.c:1718
#7  0x0000003cff139dcc in H5O_msg_exists (loc=0x7ffef0830c80, type_id=2, dxpl_id=167772168) at H5Omessage.c:885
#8  0x0000003cff0d8eaa in H5G_obj_get_linfo (grp_oloc=0x7ffef0830c80, linfo=0x7ffef0830a20, dxpl_id=167772168) at H5Gobj.c:329
#9  0x0000003cff0d93a7 in H5G_obj_lookup (grp_oloc=0x7ffef0830c80, name=0x7ffef0830d20 "seq", lnk=0x7ffef0830bd0, dxpl_id=167772168) at H5Gobj.c:1135
#10 0x0000003cff0e0780 in H5G_traverse_real (_loc=<value optimized out>, name=0x35006bf "seq", target=0, nlinks=0x7ffef0831228, op=0x3cff10a430 <H5L_link_cb>, op_data=0x7ffef08312a0, lapl_id=167772167, dxpl_id=167772168)
    at H5Gtraverse.c:642
#11 0x0000003cff0e133b in H5G_traverse (loc=0x7ffef0831440, name=0x35006b0 "/supercontig_1/seq", target=0, op=0x3cff10a430 <H5L_link_cb>, op_data=0x7ffef08312a0, lapl_id=167772167, dxpl_id=167772168) at H5Gtraverse.c:904
#12 0x0000003cff10991c in H5L_create_real (link_loc=0x7ffef0831440, link_name=0x35006b0 "/supercontig_1/seq", obj_path=0x0, obj_file=0x0, lnk=0x7ffef0831340, ocrt_info=0x7ffef08313d0, lcpl_id=167772173, lapl_id=167772167, 
    dxpl_id=167772168) at H5L.c:1883
#13 0x0000003cff109aea in H5L_link_object (new_loc=<value optimized out>, new_name=<value optimized out>, ocrt_info=<value optimized out>, lcpl_id=<value optimized out>, lapl_id=<value optimized out>, dxpl_id=<value optimized out>)
    at H5L.c:1639
#14 0x0000003cff085a3b in H5D_create_named (loc=<value optimized out>, name=<value optimized out>, type_id=50350561, space=0x4a0c030, lcpl_id=167772173, dcpl_id=<value optimized out>, dapl_id=167772167, dxpl_id=167772168) at H5Dint.c:430
#15 0x0000003cff06edbb in H5Dcreate2 (loc_id=<value optimized out>, name=0x35006b0 "/supercontig_1/seq", type_id=50350561, space_id=67108866, lcpl_id=167772173, dcpl_id=167772202, dapl_id=167772167) at H5D.c:169
#16 0x00000000004078aa in do_copy_objects ()
#17 0x00000000004082f0 in copy_objects ()
#18 0x000000000040697d in h5repack ()
#19 0x000000000040e425 in main ()

The segfaulting line occurs in H5O_dec_rc of H5O.c, and appears to be due to the deference of the oh pointer, which contains 0x0 (i.e. possibly set to, but functionally equivalent to, NULL).

I am currently attempting to determine if I can reconstruct the Genomedata run in question and if can find any pertinent log information.

I was not able to find any clear sample that was impacted by this segmentation fault. All of the outputs appeared similar (yet only 1 of 32 had a core dump produced). I may be able to comment further after continuing with downstream analyses, but that does not appear likely to yield further insight into this error.

EricR86 commented 7 years ago

Original comment by Coby Viner (Bitbucket: cviner2, GitHub: cviner).


EricR86 commented 7 years ago

Original comment by Coby Viner (Bitbucket: cviner2, GitHub: cviner).


EricR86 commented 7 years ago

Original comment by Coby Viner (Bitbucket: cviner2, GitHub: cviner).


EricR86 commented 7 years ago

Original comment by Coby Viner (Bitbucket: cviner2, GitHub: cviner).


I will use debuginfo and try to report this upstream.

EricR86 commented 7 years ago

Original comment by Coby Viner (Bitbucket: cviner2, GitHub: cviner).


EricR86 commented 5 years ago

Original comment by Coby Viner (Bitbucket: cviner2, GitHub: cviner).


This was likely due to system-specific idiosyncrasies. It can no longer be directly tested in its original computing environment, which is now obsolete. It has yet to be re-encountered elsewhere, but no attempts have been made to assess this.