Open ax3l opened 6 years ago
ping @pnorbert do you have an idea what's uninitialised here?
Sorry I dropped the ball on this. Can you give me an example code? I can't recreate this valgrind error with a simple C example writing variables and attributes.
On Wed, Aug 8, 2018 at 5:01 AM, Axel Huebl notifications@github.com wrote:
ping @pnorbert https://github.com/pnorbert do you have an idea what's uninitialised here?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ornladios/ADIOS/issues/184#issuecomment-411337711, or mute the thread https://github.com/notifications/unsubscribe-auth/ADGMLTxLibRMf17dKioUn08KwmMqrqukks5uOqjvgaJpZM4UqxyQ .
If you have CMake 3.10.0+ and ADIOS 1.13.1 in your CMAKE_PREFIX_PATH
and adios_config
in your PATH
, you can copy paste these lines:
# sources
git clone https://github.com/openPMD/openPMD-api.git
mkdir -p openPMD-api-build
cd openPMD-api-build
# configure & build
cmake ../openPMD-api -DopenPMD_USE_HDF5=OFF -DopenPMD_USE_PYTHON=OFF
cmake --build .
# find uninitialised bytes
valgrind bin/SerialIOTests
Corresponding unit test file: https://github.com/openPMD/openPMD-api/blob/c0ccefb25503b49a10938e2096098c6e3ea5c13b/test/SerialIOTest.cpp
Full output:
$ valgrind bin/SerialIOTests
==30204== Memcheck, a memory error detector
==30204== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==30204== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==30204== Command: bin/SerialIOTests
==30204==
==30204== Syscall param write(buf) points to uninitialised byte(s)
==30204== at 0x61DD190: __write_nocancel (syscall-template.S:84)
==30204== by 0x4EB602F: adios_posix_write_pg (adios_posix.c:733)
==30204== by 0x4EB6730: adios_posix_close (adios_posix.c:945)
==30204== by 0x4E5E337: common_adios_close (common_adios.c:1264)
==30204== by 0x4E5A0FB: adios_close (adios.c:210)
==30204== by 0x4E4BC53: openPMD::ADIOS1IOHandlerImpl::~ADIOS1IOHandlerImpl() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204== by 0x4E4C29E: openPMD::ADIOS1IOHandler::~ADIOS1IOHandler() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204== by 0x1DEB58: openPMD::Writable::~Writable() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x176F05: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x142F21: ____C_A_T_C_H____T_E_S_T____2() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x130DF5: Catch::RunContext::invokeActiveTestCase() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x160867: Catch::(anonymous namespace)::runTests(std::shared_ptr<Catch::Config> const&) (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== Address 0x83c13be is 894 bytes inside a block of size 16,777,223 alloc'd
==30204== at 0x4C2BADF: malloc (vg_replace_malloc.c:298)
==30204== by 0x4C2DE5F: realloc (vg_replace_malloc.c:785)
==30204== by 0x4EC11F8: adios_databuffer_resize (buffer.c:68)
==30204== by 0x4E5BFA1: common_adios_open (common_adios.c:417)
==30204== by 0x4E59DD5: adios_open (adios.c:90)
==30204== by 0x4E4BBE7: openPMD::ADIOS1IOHandlerImpl::~ADIOS1IOHandlerImpl() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204== by 0x4E4C29E: openPMD::ADIOS1IOHandler::~ADIOS1IOHandler() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204== by 0x1DEB58: openPMD::Writable::~Writable() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x176F05: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x142F21: ____C_A_T_C_H____T_E_S_T____2() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x130DF5: Catch::RunContext::invokeActiveTestCase() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x160867: Catch::(anonymous namespace)::runTests(std::shared_ptr<Catch::Config> const&) (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==
==30204== Syscall param write(buf) points to uninitialised byte(s)
==30204== at 0x61DD190: __write_nocancel (syscall-template.S:84)
==30204== by 0x4EB6177: adios_posix_write_index (adios_posix.c:776)
==30204== by 0x4EB6EDB: adios_posix_close (adios_posix.c:1087)
==30204== by 0x4E5E337: common_adios_close (common_adios.c:1264)
==30204== by 0x4E5A0FB: adios_close (adios.c:210)
==30204== by 0x4E4BC53: openPMD::ADIOS1IOHandlerImpl::~ADIOS1IOHandlerImpl() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204== by 0x4E4C29E: openPMD::ADIOS1IOHandler::~ADIOS1IOHandler() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204== by 0x1DEB58: openPMD::Writable::~Writable() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x176F05: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x142F21: ____C_A_T_C_H____T_E_S_T____2() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x130DF5: Catch::RunContext::invokeActiveTestCase() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x160867: Catch::(anonymous namespace)::runTests(std::shared_ptr<Catch::Config> const&) (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== Address 0x807e4f1 is 1,185 bytes inside a block of size 1,000,020 alloc'd
==30204== at 0x4C2BADF: malloc (vg_replace_malloc.c:298)
==30204== by 0x4C2DE5F: realloc (vg_replace_malloc.c:785)
==30204== by 0x4E62A75: buffer_write (adios_internals.c:2124)
==30204== by 0x4E671B3: adios_write_index_v1 (adios_internals.c:4086)
==30204== by 0x4EB67D7: adios_posix_close (adios_posix.c:960)
==30204== by 0x4E5E337: common_adios_close (common_adios.c:1264)
==30204== by 0x4E5A0FB: adios_close (adios.c:210)
==30204== by 0x4E4BC53: openPMD::ADIOS1IOHandlerImpl::~ADIOS1IOHandlerImpl() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204== by 0x4E4C29E: openPMD::ADIOS1IOHandler::~ADIOS1IOHandler() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204== by 0x1DEB58: openPMD::Writable::~Writable() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x176F05: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== by 0x142F21: ____C_A_T_C_H____T_E_S_T____2() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==
HZDR sample not accessible. (Supplied directory is not valid: ../samples/hzdr-sample/bp/)
===============================================================================
All tests passed (26 assertions in 5 test cases)
==30204==
==30204== HEAP SUMMARY:
==30204== in use at exit: 2,641,333 bytes in 26,714 blocks
==30204== total heap usage: 77,532 allocs, 50,818 frees, 93,048,116 bytes allocated
==30204==
==30204== LEAK SUMMARY:
==30204== definitely lost: 3,480 bytes in 29 blocks
==30204== indirectly lost: 2,636,176 bytes in 26,682 blocks
==30204== possibly lost: 0 bytes in 0 blocks
==30204== still reachable: 1,677 bytes in 3 blocks
==30204== suppressed: 0 bytes in 0 blocks
==30204== Rerun with --leak-check=full to see details of leaked memory
==30204==
==30204== For counts of detected and suppressed errors, rerun with: -v
==30204== Use --track-origins=yes to see where uninitialised values come from
==30204== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
@pnorbert I guess the variables for your test need to be large enough to trigger the ADIOS realloc
. So > 16MB if I remember the defaults right ;-)
That's not it. My example works fine with 80MB variable. I can see the valgrind error in the SerialIOTests. How can I figure which actual test is causing this error? Where are the output files from this test? I need to see what is so special about your test case.
On Wed, Aug 8, 2018 at 11:04 AM, Axel Huebl notifications@github.com wrote:
@pnorbert https://github.com/pnorbert I guess the variables for your test need to be large enough, to trigger the ADIOS realloc. So > 16MB if I remember the defaults right ;-)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ornladios/ADIOS/issues/184#issuecomment-411438739, or mute the thread https://github.com/notifications/unsubscribe-auth/ADGMLUoIvRmpn2BWi3y-apKC4dyEL0d7ks5uOv4BgaJpZM4UqxyQ .
Never mind, I found them. The error is related to long double attributes, not to buffer size.
On Wed, Aug 8, 2018 at 5:41 PM, Norbert Podhorszki < norbert.podhorszki@gmail.com> wrote:
That's not it. My example works fine with 80MB variable. I can see the valgrind error in the SerialIOTests. How can I figure which actual test is causing this error? Where are the output files from this test? I need to see what is so special about your test case.
On Wed, Aug 8, 2018 at 11:04 AM, Axel Huebl notifications@github.com wrote:
@pnorbert https://github.com/pnorbert I guess the variables for your test need to be large enough, to trigger the ADIOS realloc. So > 16MB if I remember the defaults right ;-)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ornladios/ADIOS/issues/184#issuecomment-411438739, or mute the thread https://github.com/notifications/unsubscribe-auth/ADGMLUoIvRmpn2BWi3y-apKC4dyEL0d7ks5uOv4BgaJpZM4UqxyQ .
Thanks, great news! Just ping me if I shall check again with updates in master
.
This simple example exhibits the valgrind problem with long doubles.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main (int argc, char ** argv)
{
char filename[] = "test_longdouble_valgrind.data";
long double ld1 = 1.2345e+80;
long double *bufm = (long double *) malloc (sizeof(long double));
long double *bufc = (long double *) calloc (1, sizeof(long double));
memcpy (bufm, &ld1, sizeof(long double));
memcpy (bufc, &ld1, sizeof(long double));
int fd = creat(filename, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
write(fd, bufm, sizeof(long double));
write(fd, bufc, sizeof(long double));
close(fd);
free(bufm);
free(bufc);
return 0;
}
Valgrind complains about both the malloc'd and calloc'd buffers.
$ valgrind ./test_longdouble_valgrind
==15574== Memcheck, a memory error detector
==15574== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==15574== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==15574== Command: ./test_longdouble_valgrind
==15574==
==15574== Syscall param write(buf) points to uninitialised byte(s)
==15574== at 0x4F312C0: __write_nocancel (syscall-template.S:84)
==15574== by 0x40083F: main (in /home/adios/work/test/other_tests/test_longdouble_valgrind)
==15574== Address 0x520404a is 10 bytes inside a block of size 16 alloc'd
==15574== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15574== by 0x4007D8: main (in /home/adios/work/test/other_tests/test_longdouble_valgrind)
==15574==
==15574== Syscall param write(buf) points to uninitialised byte(s)
==15574== at 0x4F312C0: __write_nocancel (syscall-template.S:84)
==15574== by 0x400855: main (in /home/adios/work/test/other_tests/test_longdouble_valgrind)
==15574== Address 0x520409a is 10 bytes inside a block of size 16 alloc'd
==15574== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15574== by 0x4007EB: main (in /home/adios/work/test/other_tests/test_longdouble_valgrind)
Do you know how to fix it?
Oh dang, it's a stdlib or valgrind issue!
Hm, maybe that's relevant: http://valgrind.org/docs/manual/manual-core.html
Precision: There is no support for 80 bit arithmetic. Internally, Valgrind represents all such "long double" numbers in 64 bits, and so there may be some differences in results. Whether or not this is critical remains to be seen. Note, the x86/amd64 fldt/fstpt instructions (read/write 80-bit numbers) are correctly simulated, using conversions to/from 64 bits, so that in-memory images of 80-bit numbers look correct if anyone wants to see.
10 bytes inside a block of size 16 alloc'd
This is a valgrind issue. Filling the buffers with some value (255) before the memcpy does not help.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main (int argc, char ** argv)
{
char filename[] = "test_longdouble_valgrind.data";
long double ld1 = 1.2345e+80;
long double *bufm = (long double *) malloc (sizeof(long double));
long double *bufc = (long double *) calloc (1, sizeof(long double));
memset (bufm, 255, sizeof(long double));
memset (bufc, 255, sizeof(long double));
printf("bufm[15]=%3.3hhu\n", ((char*)bufm)[15]);
memcpy (bufm, &ld1, sizeof(long double));
memcpy (bufc, &ld1, sizeof(long double));
printf("bufm[15]=%3.3hhu\n", ((char*)bufm)[15]);
int fd = creat(filename, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
write(fd, bufm, sizeof(long double));
write(fd, bufc, sizeof(long double));
close(fd);
free(bufm);
free(bufc);
return 0;
}
The file content shows that all 16 bytes of the buffer has been filled. The last 6 bytes are zeros.
$ ./test_longdouble_valgrind
bufm[15]=255
bufm[15]=000
$ od -bv test_longdouble_valgrind.data
0000000 000 350 363 217 374 121 104 205 011 101 100 000 000 000 000 000
0000020 000 350 363 217 374 121 104 205 011 101 100 000 000 000 000 000
Ok, I'll report that upstream with your example.
I reported your example upstream in https://bugs.kde.org/show_bug.cgi?id=397313
ADIOS 1.13.1 is adding noise to valgrind debugging with uninitialised byte(s) in the syscall to
write(buf)
inadios_posix_write_pg1
(adios_posix.c:733
).The issue is triggered when previously a
adios_databuffer_resize
(buffer.c:68
) is called - I guess a few unimportant but still uninitialized bytes slip into the finalwrite
call from it.Detailed valgrind warning for ADIOS 1.13.1 with
_nompi
library write API:@pnorbert do you have an idea what's uninitialised here? Maybe you can limit the
write()
to the initialized bytes or simply zero-init the reallocated buffer's hanging bytes? Even if it might not be a bug, it makes downstream analysis of memory leaks very noisy as soon as we use ADIOS.