Closed rlorigro closed 9 months ago
You don't say what platform you're using.
Elsewhere, you say you are using Ubuntu 22.04. I was able to reproduce this on ARM Ubuntu 22.04; using the original 22.04 libcurl3-gnutls 7.81.0-1 (rather than the current 7.81.0-1ubuntu1.13) the other symbols in the stack trace are present:
==153381== 1,402,880 bytes in 1 blocks are definitely lost in loss record 1 of 1
==153381== at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==153381== by 0x494B9E3: dyn_nappend.lto_priv.0 (dynbuf.c:98)
==153381== by 0x49752F3: UnknownInlinedFun (dynbuf.c:164)
==153381== by 0x49752F3: pausewrite (sendf.c:508)
==153381== by 0x49897FF: UnknownInlinedFun (transfer.c:842)
==153381== by 0x49897FF: Curl_readwrite (transfer.c:1226)
==153381== by 0x497161B: multi_runsingle (multi.c:2360)
==153381== by 0x49740DF: curl_multi_perform (multi.c:2638)
==153381== by 0x19A55B: wait_perform (hfile_libcurl.c:735)
==153381== by 0x19BA17: restart_from_position (hfile_libcurl.c:1061)
==153381== by 0x19CA5B: libcurl_seek (hfile_libcurl.c:979)
==153381== by 0x18E10B: hseek (hfile.c:469)
==153381== by 0x17D483: bgzf_check_EOF_common (bgzf.c:1542)
==153381== by 0x17EECB: bgzf_check_EOF (bgzf.c:2155)
This localises the unfreed allocation within libcurl. (Note also that the size of the problematic allocation changes each time you run it!)
Running with vanilla upstream libcurl 7.81.0 or with current upstream master, and without any other changes, there is no leak reported. (I was also unable to reproduce any leak on macOS.)
So the problem would appear to be caused by Ubuntu's patches to their libcurl package.
Here's the full (non-inlined) trace from an x86_86 version of libcurl 7.81.0 with the Ubuntu patches from 7.81.0-1ubuntu1.13
:
==179979== HEAP SUMMARY:
==179979== in use at exit: 6,312,960 bytes in 2 blocks
==179979== total heap usage: 253,242 allocs, 253,240 frees, 117,189,900 bytes allocated
==179979==
==179979== 6,312,960 bytes in 2 blocks are definitely lost in loss record 1 of 1
==179979== at 0x484DCD3: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==179979== by 0x49CA5AD: dyn_nappend (dynbuf.c:98)
==179979== by 0x49CA6CC: Curl_dyn_addn (dynbuf.c:164)
==179979== by 0x4A0F1F2: pausewrite (sendf.c:508)
==179979== by 0x4A0F297: chop_write (sendf.c:539)
==179979== by 0x4A0F591: Curl_client_write (sendf.c:645)
==179979== by 0x4A2D746: readwrite_data (transfer.c:842)
==179979== by 0x4A2E209: Curl_readwrite (transfer.c:1226)
==179979== by 0x4A02F86: multi_runsingle (multi.c:2360)
==179979== by 0x4A03796: curl_multi_perform (multi.c:2638)
==179979== by 0x216894: wait_perform (hfile_libcurl.c:733)
==179979== by 0x217738: restart_from_position (hfile_libcurl.c:1059)
==179979==
==179979== LEAK SUMMARY
==179979== definitely lost: 6,312,960 bytes in 2 blocks
==179979== indirectly lost: 0 bytes in 0 blocks
==179979== possibly lost: 0 bytes in 0 blocks
==179979== still reachable: 0 bytes in 0 blocks
==179979== suppressed: 0 bytes in 0 blocks
This was built with the Ubuntu configure options, notably --with-nghttp2
. The issue actually also affects the non-patched 7.81.0, but not libcurl master. The problem is that when using HTTP/2, libcurl buffers up a bit of data while we're trying to seek to the end of the file for the EOF block check, and that data doesn't get cleaned up.
After hunting around a bit, I found issue curl/curl#10971 (Memory leak when cancelling paused http 2 transfer) which was fixed by the commit https://github.com/curl/curl/commit/81b2b577df. The scenario in that issue is pretty much exactly what HTSlib's restart_from_position()
does, and the GCS server is dishing out HTTP/2.
So it's actually a libcurl problem, which has been fixed and will eventually make it into Ubuntu (although probably not 22.04).
I see, thank you for finding this so quickly. I should have known the leak wasn't from htslib :)
Is there a way to configure htslib to use a specific curl path (that I have compiled from source)? My build instructions are in CMake so it looks like this:
# HTSLIB configure
add_custom_target(
BUILD_HTS
ALL
WORKING_DIRECTORY ${HTS_DIR}
COMMAND pwd
COMMAND autoreconf -i
COMMAND ./configure --disable-lzma --without-libdeflate
COMMAND $(MAKE) print-config
COMMAND $(MAKE) prefix=${CMAKE_SOURCE_DIR}/external/htslib/ install
)
Nice spotting, Rob — I didn't think to look at the configuration options. One advantage of RPMs is the single spec file makes it much easier for third parties to recreate distributions' packages…
Hi,
I am attempting to use the htslib C interface in my project and I find a large leak from htslib using both AddressSanitizer and Valgrind. Here is the more verbose output from valgrind:
For this test I have used this public GS URI: gs://genomics-public-data/platinum-genomes/bam/NA12889_S1.bam
When I run the same code on a local BAM path (stored on my disk) I get a clean result from valgrind:
It appears to be coming from libcurl, so I asked the curl authors and they are pretty confident that it is a result of a misuse of the library. However, it could easily be something I have done incorrectly because I am not an expert on C style memory management. My code is here:
Could you let me know if this is a simple mistake on my end or something more serious? I need to run a function like this on many windows repeatedly so this leak would get out of hand pretty quickly.
Thanks