szcompressor / SZ

Error-bounded Lossy Data Compressor (for floating-point/integer datasets)
http://szcompressor.org
Other
155 stars 56 forks source link

segmentation fault in H5Z-SZ #72

Closed BinDong314 closed 3 years ago

BinDong314 commented 3 years ago

Hello, SZ Community, I had a following segmentation error while running the szToHDF5 code. Could anyone please help to identify the cause and help to provide some solution?

Thanks, Bin

System is Mac 11.2.3 (20D91), hdf5-1.10.7, mpicc --version Apple clang version 12.0.0 (clang-1200.0.32.29) Target: x86_64-apple-darwin20.3.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

The compiling is OK but it reports segmentation fault when run the szToHDF5 code.

MBP H5Z-SZ % ./test/szToHDF5 -u16 sz.config ../../example/testdata/x86/testint16_8x8x8.dat 8 8 8 config file = sz.config cfgFile=sz.config outputfile=../../example/testdata/x86/testint16_8x8x8.dat.sz.h5 Dimension sizes: n5=0, n4=0, n3=8, n2=8, n1=8 sz filter is available for encoding and decoding. ....Writing SZ compressed data............. original data = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 .... zsh: segmentation fault ./test/szToHDF5 -u16 sz.config ../../example/testdata/x86/testint16_8x8x8.dat

When using the lldb to dump the trace log, it shows below information.

MBP H5Z-SZ % lldb ./test/szToHDF5 (lldb) target create "./test/szToHDF5" Current executable set to '/Users/dbin/work/soft/SZ/hdf5-filter/H5Z-SZ/test/szToHDF5' (x86_64). (lldb) run -u16 sz.config ../../example/testdata/x86/testint16_8x8x8.dat 8 8 8 Process 37002 launched: '/Users/dbin/work/soft/SZ/hdf5-filter/H5Z-SZ/test/szToHDF5' (x86_64) config file = sz.config cfgFile=sz.config outputfile=../../example/testdata/x86/testint16_8x8x8.dat.sz.h5 Dimension sizes: n5=0, n4=0, n3=8, n2=8, n1=8 sz filter is available for encoding and decoding. ....Writing SZ compressed data............. original data = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 .... Process 37002 stopped

disheng222 commented 3 years ago

I test it on my linux fedora system, but couldn't reproduce the issue. I used valgrind to check possible memory issue, and it reported 'Syscall param write(buf) points to uninitialised byte(s)' warning issue. The reason is that at some point, some memory was allocated without initialization (i.e., without memset()). This caused many errors reported by valgrind. After adding the memset() after the malloc() at that point, the error message disappeared. For now, valgrind reports completely clean message (without any errors) when running that command. Could you try it again to see if the problem has been solved on your side.

disheng222 commented 3 years ago

Hi Bin, I can't reproduce the segmentation fault issue on my laptop. But I did find some error messages reported by Valgrind, which has been fixed. Could you check if it works well on your machine now?

Best, Sheng

On Wed, Jul 21, 2021 at 12:37 AM dbinlbl @.***> wrote:

Hello, SZ Community, I had a following segmentation error while running the szToHDF5 code. Could anyone please help to identify the cause and help to provide some solution?

Thanks, Bin

System is Mac 11.2.3 (20D91), hdf5-1.10.7, mpicc --version Apple clang version 12.0.0 (clang-1200.0.32.29) Target: x86_64-apple-darwin20.3.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

The compiling is OK but it reports segmentation fault when run the szToHDF5 code.

MBP H5Z-SZ % ./test/szToHDF5 -u16 sz.config ../../example/testdata/x86/testint16_8x8x8.dat 8 8 8 config file = sz.config cfgFile=sz.config outputfile=../../example/testdata/x86/testint16_8x8x8.dat.sz.h5 Dimension sizes: n5=0, n4=0, n3=8, n2=8, n1=8 sz filter is available for encoding and decoding. ....Writing SZ compressed data............. original data = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 .... zsh: segmentation fault ./test/szToHDF5 -u16 sz.config ../../example/testdata/x86/testint16_8x8x8.dat

When using the lldb to dump the trace log, it shows below information.

MBP H5Z-SZ % lldb ./test/szToHDF5 (lldb) target create "./test/szToHDF5" Current executable set to '/Users/dbin/work/soft/SZ/hdf5-filter/H5Z-SZ/test/szToHDF5' (x86_64). (lldb) run -u16 sz.config ../../example/testdata/x86/testint16_8x8x8.dat 8 8 8 Process 37002 launched: '/Users/dbin/work/soft/SZ/hdf5-filter/H5Z-SZ/test/szToHDF5' (x86_64) config file = sz.config cfgFile=sz.config outputfile=../../example/testdata/x86/testint16_8x8x8.dat.sz.h5 Dimension sizes: n5=0, n4=0, n3=8, n2=8, n1=8 sz filter is available for encoding and decoding. ....Writing SZ compressed data............. original data = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 .... Process 37002 stopped

  • thread #1 https://github.com/szcompressor/SZ/issues/1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xc) frame #0: 0x00000001002d89b7 libSZ.1.dylibSZ_Init + 23 libSZ.1.dylib SZ_Init: -> 0x1002d89b7 <+23>: movl $0x8, 0xc(%rax) 0x1002d89be <+30>: movq 0x28eeb(%rip), %rax ; confparams_cpr 0x1002d89c5 <+37>: xorl %ebx, %ebx 0x1002d89c7 <+39>: cmpl $0x3, 0x20(%rax) Target 0: (szToHDF5) stopped. (lldb) bt
  • thread #1 https://github.com/szcompressor/SZ/issues/1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xc)
    • frame #0: 0x00000001002d89b7 libSZ.1.dylibSZ_Init + 23 frame #1: 0x00000001018cf318 libhdf5sz.soH5Z_sz_set_local(dcpl_id=720575940379279377, type_id=216172782113783878, chunk_space_id=288230376151711748) at H5Z_SZ.c:0 [opt] frame #2 https://github.com/szcompressor/SZ/issues/2: 0x00000001005e8d5a libhdf5.103.dylibH5Zprelude_callback(pline=0x00007ffeefbfd688, dcpl_id=720575940379279377, type_id=216172782113783878, space_id=288230376151711748, prelude_type=) at H5Z.c:779:29 [opt] frame #3: 0x00000001005e890e libhdf5.103.dylibH5Z__prepare_prelude_callback_dcpl(dcpl_id=720575940379279377, type_id=216172782113783878, prelude_type=H5Z_PRELUDE_SET_LOCAL) at H5Z.c:865:21 [opt] frame #4 https://github.com/szcompressor/SZ/issues/4: 0x00000001005e8ac3 libhdf5.103.dylibH5Z_set_local(dcpl_id=720575940379279377, type_id=216172782113783878) at H5Z.c:936:9 [opt] frame #5: 0x00000001003c8712 libhdf5.103.dylibH5Dcreate(file=, type_id=, space=, dcpl_id=, dapl_id=) at H5Dint.c:1238:16 [opt] frame #6 https://github.com/szcompressor/SZ/issues/6: 0x00000001003d5410 libhdf5.103.dylibH5Odset_create(f=, _crt_info=, obj_loc=0x00007ffeefbfd9f0) at H5Doh.c:299:24 [opt] frame #7: 0x00000001004bcd38 libhdf5.103.dylibH5O_obj_create(f=, obj_type=, crt_info=, obj_loc=0x00007ffeefbfd9f0) at H5Oint.c:2495:37 [opt] frame #8 https://github.com/szcompressor/SZ/issues/8: 0x000000010048854d libhdf5.103.dylibH5Llink_cb(grp_loc=0x00007ffeefbfdcd0, name="testdata_compressed", lnk=, obj_loc=, _udata=0x00007ffeefbfe1a8, own_loc=0x00007ffeefbfdcec) at H5L.c:1651:53 [opt] frame #9: 0x00000001004594fc libhdf5.103.dylibH5Gtraverse_real(_loc=, name="testdata_compressed", target=, op=(libhdf5.103.dylibH5Llink_cb at H5L.c:1627), op_data=) at H5Gtraverse.c:623:16 [opt] frame

      10: 0x0000000100458631 libhdf5.103.dylibH5G_traverse(loc=0x00007ffeefbfe2f8,

      name="testdata_compressed", target=0, op=(libhdf5.103.dylibH5L__link_cb at H5L.c:1627), op_data=) at H5Gtraverse.c:847:8 [opt] frame

      11: 0x0000000100487422 libhdf5.103.dylibH5L__create_real(link_loc=,

      link_name="testdata_compressed", obj_path=0x0000000000000000, obj_file=0x0000000000000000, lnk=, ocrt_info=, lcpl_id=720575940379279374) at H5L.c:1845:8 [opt] frame #12 https://github.com/szcompressor/SZ/issues/12: 0x0000000100487574 libhdf5.103.dylibH5L_link_object(new_loc=, new_name=, ocrt_info=, lcpl_id=) at H5L.c:1604:8 [opt] frame #13: 0x00000001003c7a0f libhdf5.103.dylibH5D__create_named(loc=0x00007ffeefbfe2f8, name="testdata_compressed", type_id=216172782113783878, space=0x0000000101b41210, lcpl_id=720575940379279374, dcpl_id=720575940379279376, dapl_id=720575940379279367) at H5Dint.c:337:8 [opt] frame #14 https://github.com/szcompressor/SZ/pull/14: 0x00000001003a49bf libhdf5.103.dylibH5Dcreate2(loc_id=72057594037927936, name="testdata_compressed", type_id=, space_id=, lcpl_id=720575940379279374, dcpl_id=720575940379279376, dapl_id=) at H5D.c:151:24 [opt] frame #15: 0x0000000100004c78 szToHDF5main(argc=7, argv=0x00007ffeefbff738) at szToHDF5.c:269:21 frame #16 https://github.com/szcompressor/SZ/issues/16: 0x00007fff2036b621 libdyld.dylibstart + 1 frame #17: 0x00007fff2036b621 libdyld.dylibstart + 1

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/szcompressor/SZ/issues/72, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK3KSPLWU4UBSYM64GKEK3TYZMJRANCNFSM5AXHE2CQ .

BinDong314 commented 3 years ago

Thanks Sheng, It seems to work now by a simple test.

dbin@Bins-MBP H5Z-SZ % ./test/szToHDF5 -u16 sz.config ../../example/testdata/x86/testint16_8x8x8.dat 8 8 8 config file = sz.config cfgFile=sz.config Dimension sizes: n5=0, n4=0, n3=8, n2=8, n1=8 sz filter is available for encoding and decoding. ....Writing SZ compressed data............. original data = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 .... Output hdf5 file: ../../example/testdata/x86/testint16_8x8x8.dat.sz.h5

dbin@Bins-MBP H5Z-SZ % ./test/dszFromHDF5 ../../example/testdata/x86/testint16_8x8x8.dat.sz.h5 sz filter is available. ....Reading SZ compressed data ..................... data type: unsigned short reconstructed data = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

However, I have some minor issues to run the test script, as shown below. It looks like you have hand-coded library name as "lib/libhdf5sz.so" (but not "libhdf5sz.so") in Makefile.

dbin@Bins-MBP test % ./test_compress.sh szToHDF5 -f sz.config ../../../example/testdata/x86/testfloat_8_8_128.dat 8 8 128 dyld: Library not loaded: lib/libhdf5sz.so Referenced from: /Users/dbin/work/soft/SZ/hdf5-filter/H5Z-SZ/test/./szToHDF5 Reason: image not found

Also, in the test_compress.sh, you may want to "./szToHDF5" instead of "szToHDF5" .

Last, when I run h5dump on the compressed file, it reports the error but the file size are correct. Any hint to make it work? I understand it might be some issue with HDF5 library but not H5-SZ. Maybe a h5dump tool for H5_SZ is a worth-try?

MBP H5Z-SZ % h5dump ../../example/testdata/x86/testint16_8x8x8.dat.sz.h5 HDF5 "../../example/testdata/x86/testint16_8x8x8.dat.sz.h5" { GROUP "/" { DATASET "testdata_compressed" { DATATYPE H5T_STD_U16LE DATASPACE SIMPLE { ( 8, 8, 8 ) / ( 8, 8, 8 ) } h5dump(6169,0x10ea4fe00) malloc: error for object 0x7fcdb8575300: pointer being freed was not allocated h5dump(6169,0x10ea4fe00) malloc: set a breakpoint in malloc_error_break to debug DATA {zsh: abort h5dump ../../example/testdata/x86/testint16_8x8x8.dat.sz.h5 MBP H5Z-SZ %

disheng222 commented 3 years ago

Hi Bin, The reason for the 'Library not loaded' is that you need to add SZ's installation library to LD_LIBRARAY_PATH. Alternatively, inside the Makefile, you can add '-Wl,-rpath,$(SZPATH)/lib'' to SZ_PATH in Makefile. I have revised the Makefile, so the 'Library not loaded' issue should be gone. To check it again, you need to git pull, and recompile hdf5-filter and examples (i.e., run 'make' in hdf5-filter/H5Z-SZ and run 'make' in the dir hdf5-filter/H5Z-SZ/test). Then, you should be able to run test_compress.sh and test_decompress.sh without errors. I also added './' in the .sh files.

Please note that testint16_8x8x8.dat.sz.h5 is the compressed data file, so you should not use h5dump to read it. FYI, testfloat_8_8_128.sz.out.h5 stored in ../../../example/testdata/x86 is the corresponding decompressed/reconstructed data file. For testint16_8x8x8.dat.sz.h5, you can use the compiled executable - dszFromHDF5 or HDF5's command - h5repack to read it.

I suggest you take a look at the docs/H5Z-SZ-Guide.pdf, which tells you the two execution modes. I think the h5repack execution mode (i.e., use-case B) is more convenient. With h5repack method, you can compress a .h5 file directly, storing the compressed data into another .h5 file; and also decompress the compressed .h5 file transparently.

Best, Sheng

On Thu, Jul 22, 2021 at 10:53 AM dbinlbl @.***> wrote:

Thanks Sheng, It seems to work now by a simple test.

dbin@Bins-MBP H5Z-SZ % ./test/szToHDF5 -u16 sz.config ../../example/testdata/x86/testint16_8x8x8.dat 8 8 8 config file = sz.config cfgFile=sz.config Dimension sizes: n5=0, n4=0, n3=8, n2=8, n1=8 sz filter is available for encoding and decoding. ....Writing SZ compressed data............. original data = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 .... Output hdf5 file: ../../example/testdata/x86/testint16_8x8x8.dat.sz.h5

dbin@Bins-MBP H5Z-SZ % ./test/dszFromHDF5 ../../example/testdata/x86/testint16_8x8x8.dat.sz.h5 sz filter is available. ....Reading SZ compressed data ..................... data type: unsigned short reconstructed data = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

However, I have some minor issues to run the test script, as shown below. It looks like you have hand-coded library name as "lib/libhdf5sz.so" (but not "libhdf5sz.so") in Makefile.

dbin@Bins-MBP test % ./test_compress.sh szToHDF5 -f sz.config ../../../example/testdata/x86/testfloat_8_8_128.dat 8 8 128 dyld: Library not loaded: lib/libhdf5sz.so Referenced from: /Users/dbin/work/soft/SZ/hdf5-filter/H5Z-SZ/test/./szToHDF5 Reason: image not found

Also, in the test_compress.sh, you may want to "./szToHDF5" instead of "szToHDF5" .

Last, when I run h5dump on the compressed file, it reports the error but the file size are correct. Any hint to make it work? I understand it might be some issue with HDF5 library but not H5-SZ. Maybe a h5dump tool for H5_SZ is a worth-try?

MBP H5Z-SZ % h5dump ../../example/testdata/x86/testint16_8x8x8.dat.sz.h5 HDF5 "../../example/testdata/x86/testint16_8x8x8.dat.sz.h5" { GROUP "/" { DATASET "testdata_compressed" { DATATYPE H5T_STD_U16LE DATASPACE SIMPLE { ( 8, 8, 8 ) / ( 8, 8, 8 ) } h5dump(6169,0x10ea4fe00) malloc: error for object 0x7fcdb8575300: pointer being freed was not allocated h5dump(6169,0x10ea4fe00) malloc: set a breakpoint in malloc_error_break to debug DATA {zsh: abort h5dump ../../example/testdata/x86/testint16_8x8x8.dat.sz.h5 MBP H5Z-SZ %

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/szcompressor/SZ/issues/72#issuecomment-885023012, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK3KSLLD25KIIUXD536LXTTZA5F3ANCNFSM5AXHE2CQ .

BinDong314 commented 3 years ago

Hi Disheng, just FYI, the error "dyld: Library not loaded: lib/libhdf5sz.so" is still there.

disheng222 commented 3 years ago

Hi, I tested it again and it worked well on my machine. Did you modify SZPATH and HDF5PATH in Makefile. They should point to the correct location on your system. Moreover, after executing 'make', you need to run 'make install', so that libhdf5sz.so can be found in the correct directory (i.e., SZPATH).

Best, Sheng

On Mon, Jul 26, 2021 at 10:50 AM dbinlbl @.***> wrote:

Hi Disheng, just FYI, the error "dyld: Library not loaded: lib/libhdf5sz.so" is still there.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/szcompressor/SZ/issues/72#issuecomment-886820351, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK3KSMNF6AH6BYKLABJRCTTZV75XANCNFSM5AXHE2CQ .

BinDong314 commented 3 years ago

Hi Sheng, Thanks for looking into this. It looks like that the executable code produced has a hand-coded path for the libhdf5sz.so.

% otool -L szToHDF5 szToHDF5: .... lib/libhdf5sz.so (compatibility version 0.0.0, current version 0.0.0)

After running below line, the code works: install_name_tool -change lib/libhdf5sz.so $H5SZPATH/libhdf5sz.so szToHDF5

$H5SZPATH is the path to libhdf5sz.so.

BinDong314 commented 3 years ago

Found that setting DYLD_LIBRARY_PATH in Mac works.

Just put here in case someone has the same issue.