yayahjb / cbflib

CBFlib repository cloned from SF CBFlib repository as of 1 Dec 15
8 stars 20 forks source link

nexus2cbf test optimization-dependent crashes #15

Closed jcbollinger closed 4 years ago

jcbollinger commented 4 years ago

I am building and CBFLib 0.9.6 from the distribution package available at SourceForge, on CentOS Linux 8 (x86_64), using the distribution's GCC toochain version 8.3.1. The library, wrappers, and example programs all built successfully, but when I ran the tests, nexus2cbf crashed:

LD_LIBRARY_PATH=/home/jbolling/tmp/cbflib-CBFlib-0.9.6/solib:$LD_LIBRARY_PATH;export LD_LIBRARY_PATH; cd /home/jbolling/tmp/cbflib-CBFlib-0.9.6/minicbf_test; time /home/jbolling/tmp/cbflib-CBFlib-0.9.6/bin/nexus2cbf \
-o i19-1.cbf i19-1.h5
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
  #000: H5D.c line 342 in H5Dopen2(): no name
    major: Invalid arguments to routine
    minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
  #000: H5L.c line 1183 in H5Literate(): link iteration failed
    major: Symbol table
    minor: Iteration failed
  #001: H5Gint.c line 844 in H5G_iterate(): error iterating over links
    major: Symbol table
    minor: Iteration failed
  #002: H5Gobj.c line 708 in H5G__obj_iterate(): can't iterate over symbol table
    major: Symbol table
    minor: Iteration failed
  #003: H5Gstab.c line 566 in H5G__stab_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #004: H5B.c line 1221 in H5B_iterate(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #005: H5B.c line 1177 in H5B_iterate_helper(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #006: H5Gnode.c line 1039 in H5G__node_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
  #000: H5L.c line 1183 in H5Literate(): link iteration failed
    major: Symbol table
    minor: Iteration failed
  #001: H5Gint.c line 844 in H5G_iterate(): error iterating over links
    major: Symbol table
    minor: Iteration failed
  #002: H5Gobj.c line 708 in H5G__obj_iterate(): can't iterate over symbol table
    major: Symbol table
    minor: Iteration failed
  #003: H5Gstab.c line 566 in H5G__stab_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #004: H5B.c line 1221 in H5B_iterate(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #005: H5B.c line 1177 in H5B_iterate_helper(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #006: H5Gnode.c line 1039 in H5G__node_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
  #000: H5L.c line 1183 in H5Literate(): link iteration failed
    major: Symbol table
    minor: Iteration failed
  #001: H5Gint.c line 844 in H5G_iterate(): error iterating over links
    major: Symbol table
    minor: Iteration failed
  #002: H5Gobj.c line 708 in H5G__obj_iterate(): can't iterate over symbol table
    major: Symbol table
    minor: Iteration failed
  #003: H5Gstab.c line 566 in H5G__stab_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #004: H5B.c line 1221 in H5B_iterate(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #005: H5B.c line 1177 in H5B_iterate_helper(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #006: H5Gnode.c line 1039 in H5G__node_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
An error occured, will not try to write the file 'i19-1.cbf'
Time to convert 'i19-1.cbf': 0.006s

real    0m0.008s
user    0m0.006s
sys     0m0.002s
make: *** [Makefile_LINUX_64:2230: extra] Error 1

I found, however, that if I used the default Makefile instead of Makefile_LINUX_64, then the same test ran successfully. Ultimately, I traced the issue to optimization level. nexus2cbf works if everything is built with optimization -O3 (default Makefile), but not if everything is built with optimization -O2 (Makefile_LINUX_64 and Makefile_LINUX).

There is a clear workaround here, of course, but optimization-dependent misbehavior is usually a sign of a deeper problem, especially when it is the more aggressive optimization level that is required for (apparently) correct behavior.

yayahjb commented 4 years ago

Dear John, The hdf5 library is, unfortunately, fragile. Some combinations work on some systems and not on others. Things got worse with hdf5-1.10.1 through hdf5-1.10.4, then a little better with hdf5-1.10.5 and still better with hdf5-1.10.6. Now hdf5-1.12 is out, so I'll be investigating that next. Other destabilizing issues are use or no use of dlfnc, gcc and gfortran versions or use of intel compilers. Yes there is a deep problem here. Got a student who likes a challenge? Regards, Herbert

On Wed, Apr 29, 2020 at 6:29 PM John Bollinger notifications@github.com wrote:

I am building and CBFLib 0.9.6 from the distribution package available at SourceForge, on CentOS Linux 8 (x86_64), using the distribution's GCC toochain version 8.3.1. The library, wrappers, and example programs all built successfully, but when I ran the tests, nexus2cbf crashed:

LD_LIBRARY_PATH=/home/jbolling/tmp/cbflib-CBFlib-0.9.6/solib:$LD_LIBRARY_PATH;export LD_LIBRARY_PATH; cd /home/jbolling/tmp/cbflib-CBFlib-0.9.6/minicbf_test; time /home/jbolling/tmp/cbflib-CBFlib-0.9.6/bin/nexus2cbf \ -o i19-1.cbf i19-1.h5 HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:

000: H5D.c line 342 in H5Dopen2(): no name

major: Invalid arguments to routine
minor: Bad value

HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:

000: H5L.c line 1183 in H5Literate(): link iteration failed

major: Symbol table
minor: Iteration failed

001: H5Gint.c line 844 in H5G_iterate(): error iterating over links

major: Symbol table
minor: Iteration failed

002: H5Gobj.c line 708 in H5G__obj_iterate(): can't iterate over symbol table

major: Symbol table
minor: Iteration failed

003: H5Gstab.c line 566 in H5G__stab_iterate(): iteration operator failed

major: Symbol table
minor: Can't move to next iterator location

004: H5B.c line 1221 in H5B_iterate(): B-tree iteration failed

major: B-Tree node
minor: Iteration failed

005: H5B.c line 1177 in H5B_iterate_helper(): B-tree iteration failed

major: B-Tree node
minor: Iteration failed

006: H5Gnode.c line 1039 in H5G__node_iterate(): iteration operator failed

major: Symbol table
minor: Can't move to next iterator location

HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:

000: H5L.c line 1183 in H5Literate(): link iteration failed

major: Symbol table
minor: Iteration failed

001: H5Gint.c line 844 in H5G_iterate(): error iterating over links

major: Symbol table
minor: Iteration failed

002: H5Gobj.c line 708 in H5G__obj_iterate(): can't iterate over symbol table

major: Symbol table
minor: Iteration failed

003: H5Gstab.c line 566 in H5G__stab_iterate(): iteration operator failed

major: Symbol table
minor: Can't move to next iterator location

004: H5B.c line 1221 in H5B_iterate(): B-tree iteration failed

major: B-Tree node
minor: Iteration failed

005: H5B.c line 1177 in H5B_iterate_helper(): B-tree iteration failed

major: B-Tree node
minor: Iteration failed

006: H5Gnode.c line 1039 in H5G__node_iterate(): iteration operator failed

major: Symbol table
minor: Can't move to next iterator location

HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:

000: H5L.c line 1183 in H5Literate(): link iteration failed

major: Symbol table
minor: Iteration failed

001: H5Gint.c line 844 in H5G_iterate(): error iterating over links

major: Symbol table
minor: Iteration failed

002: H5Gobj.c line 708 in H5G__obj_iterate(): can't iterate over symbol table

major: Symbol table
minor: Iteration failed

003: H5Gstab.c line 566 in H5G__stab_iterate(): iteration operator failed

major: Symbol table
minor: Can't move to next iterator location

004: H5B.c line 1221 in H5B_iterate(): B-tree iteration failed

major: B-Tree node
minor: Iteration failed

005: H5B.c line 1177 in H5B_iterate_helper(): B-tree iteration failed

major: B-Tree node
minor: Iteration failed

006: H5Gnode.c line 1039 in H5G__node_iterate(): iteration operator failed

major: Symbol table
minor: Can't move to next iterator location

An error occured, will not try to write the file 'i19-1.cbf' Time to convert 'i19-1.cbf': 0.006s

real 0m0.008s user 0m0.006s sys 0m0.002s make: *** [Makefile_LINUX_64:2230: extra] Error 1

I found, however, that if I used the default Makefile instead of Makefile_LINUX_64, then the same test ran successfully. Ultimately, I traced the issue to optimization level. nexus2cbf works if everything is built with optimization -O3 (default Makefile), but not if everything is built with optimization -O2 (Makefile_LINUX_64 and Makefile_LINUX).

There is a clear workaround here, of course, but optimization-dependent misbehavior is usually a sign of a deeper problem, especially when it is the more aggressive optimization level that is required for (apparently) correct behavior.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yayahjb/cbflib/issues/15, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABB6EAJ4WNYGBBP6EDSZ3YLRPCS47ANCNFSM4MUGFKYQ .

jcbollinger commented 4 years ago

Dear Herbert,

I don't have a student to assign to this, but it turns out that I had some time to look into it personally. Expect a PR soon.

John

yayahjb commented 4 years ago

Thank you.