opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
154 stars 113 forks source link

Address memory leak in test cases #1777

Open heemin32 opened 3 months ago

heemin32 commented 3 months ago

Currently, there is some memory leak detected through valgrind. Need to address memory leak.

==27211== 
==27211== HEAP SUMMARY:
==27211==     in use at exit: 394,772 bytes in 41 blocks
==27211==   total heap usage: 591,179 allocs, 591,138 frees, 980,880,405 bytes allocated
==27211== 
==27211== 72 bytes in 1 blocks are definitely lost in loss record 8 of 33
==27211==    at 0x48657B8: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==27211==    by 0x1E9347: FaissIsSharedIndexStateRequired_BasicAssertions_Test::TestBody() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x779B5B: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x767AAB: testing::Test::Run() [clone .part.0] (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x76807F: testing::TestInfo::Run() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x768F8F: testing::TestSuite::Run() [clone .part.0] (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x77055F: testing::internal::UnitTestImpl::RunAllTests() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x768197: testing::UnitTest::Run() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x1E6A3F: main (in /workspace/k-NN/jni/bin/jni_test)
==27211== 
==27211== 72 bytes in 1 blocks are definitely lost in loss record 9 of 33
==27211==    at 0x48657B8: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==27211==    by 0x1E9387: FaissIsSharedIndexStateRequired_BasicAssertions_Test::TestBody() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x779B5B: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x767AAB: testing::Test::Run() [clone .part.0] (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x76807F: testing::TestInfo::Run() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x768F8F: testing::TestSuite::Run() [clone .part.0] (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x77055F: testing::internal::UnitTestImpl::RunAllTests() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x768197: testing::UnitTest::Run() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x1E6A3F: main (in /workspace/k-NN/jni/bin/jni_test)
==27211== 
==27211== 84 (24 direct, 60 indirect) bytes in 1 blocks are definitely lost in loss record 12 of 33
==27211==    at 0x48657B8: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==27211==    by 0x5A6A907: knn_jni::commons::storeVectorData(knn_jni::JNIUtilInterface*, JNIEnv_*, long, _jobjectArray*, long) (in /workspace/k-NN/jni/release/libopensearchknn_util.so)
==27211==    by 0x27962F: CommonsTests_BasicAssertions_Test::TestBody() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x779B5B: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x767AAB: testing::Test::Run() [clone .part.0] (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x76807F: testing::TestInfo::Run() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x768F8F: testing::TestSuite::Run() [clone .part.0] (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x77055F: testing::internal::UnitTestImpl::RunAllTests() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x768197: testing::UnitTest::Run() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x1E6A3F: main (in /workspace/k-NN/jni/bin/jni_test)
==27211== 
==27211== 3,312 bytes in 9 blocks are possibly lost in loss record 22 of 33
==27211==    at 0x4869F34: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==27211==    by 0x4010F83: calloc (rtld-malloc.h:44)
==27211==    by 0x4010F83: allocate_dtv (dl-tls.c:375)
==27211==    by 0x4011983: _dl_allocate_tls (dl-tls.c:634)
==27211==    by 0x5FEE087: allocate_stack (allocatestack.c:430)
==27211==    by 0x5FEE087: pthread_create@@GLIBC_2.34 (pthread_create.c:647)
==27211==    by 0x5C2E873: ??? (in /usr/lib/aarch64-linux-gnu/libgomp.so.1.0.0)
==27211==    by 0x5C25E9B: GOMP_parallel (in /usr/lib/aarch64-linux-gnu/libgomp.so.1.0.0)
==27211==    by 0x285B23: faiss::(anonymous namespace)::hnsw_add_vertices(faiss::IndexHNSW&, unsigned long, unsigned long, float const*, bool, bool) (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x28C047: faiss::IndexIDMapTemplate<faiss::Index>::add_with_ids(long, float const*, long const*) (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x498CD13: knn_jni::faiss_wrapper::CreateIndex(knn_jni::JNIUtilInterface*, JNIEnv_*, _jintArray*, long, int, _jstring*, _jobject*) (in /workspace/k-NN/jni/release/libopensearchknn_faiss.so)
==27211==    by 0x1EAF6B: FaissCreateIndexTest_BasicAssertions_Test::TestBody() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x779B5B: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x767AAB: testing::Test::Run() [clone .part.0] (in /workspace/k-NN/jni/bin/jni_test)
==27211== 
==27211== 131,960 (520 direct, 131,440 indirect) bytes in 1 blocks are definitely lost in loss record 32 of 33
==27211==    at 0x48657B8: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==27211==    by 0x1E93DF: FaissIsSharedIndexStateRequired_BasicAssertions_Test::TestBody() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x779B5B: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x767AAB: testing::Test::Run() [clone .part.0] (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x76807F: testing::TestInfo::Run() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x768F8F: testing::TestSuite::Run() [clone .part.0] (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x77055F: testing::internal::UnitTestImpl::RunAllTests() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x768197: testing::UnitTest::Run() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x1E6A3F: main (in /workspace/k-NN/jni/bin/jni_test)
==27211== 
==27211== 131,960 (520 direct, 131,440 indirect) bytes in 1 blocks are definitely lost in loss record 33 of 33
==27211==    at 0x48657B8: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==27211==    by 0x1E9437: FaissIsSharedIndexStateRequired_BasicAssertions_Test::TestBody() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x779B5B: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x767AAB: testing::Test::Run() [clone .part.0] (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x76807F: testing::TestInfo::Run() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x768F8F: testing::TestSuite::Run() [clone .part.0] (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x77055F: testing::internal::UnitTestImpl::RunAllTests() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x768197: testing::UnitTest::Run() (in /workspace/k-NN/jni/bin/jni_test)
==27211==    by 0x1E6A3F: main (in /workspace/k-NN/jni/bin/jni_test)
==27211== 
==27211== LEAK SUMMARY:
==27211==    definitely lost: 1,208 bytes in 5 blocks
==27211==    indirectly lost: 262,940 bytes in 13 blocks
==27211==      possibly lost: 3,312 bytes in 9 blocks
==27211==    still reachable: 127,312 bytes in 14 blocks
==27211==         suppressed: 0 bytes in 0 blocks
==27211== Reachable blocks (those to which a pointer was found) are not shown.
==27211== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==27211== 
==27211== Use --track-origins=yes to see where uninitialised values come from
==27211== For lists of detected and suppressed errors, rerun with: -s
==27211== ERROR SUMMARY: 109 errors from 9 contexts (suppressed: 0 from 0)
jmazanec15 commented 3 months ago

In the test code, I see a couple instances of alloc without free:

  1. https://github.com/opensearch-project/k-NN/blob/main/jni/tests/nmslib_wrapper_unit_test.cpp#L54
  2. https://github.com/opensearch-project/k-NN/blob/main/jni/tests/nmslib_wrapper_unit_test.cpp#L70-L72
  3. https://github.com/opensearch-project/k-NN/blob/main/jni/tests/faiss_wrapper_test.cpp#L723-L724
  4. https://github.com/opensearch-project/k-NN/blob/main/jni/tests/faiss_wrapper_test.cpp#L120
  5. https://github.com/opensearch-project/k-NN/blob/main/jni/tests/faiss_wrapper_test.cpp#L697 Im guessing this has a lot to do with the issue
jmazanec15 commented 2 months ago

Did the following with gtest_filter:

Nmslib*

==2702==
==2702== HEAP SUMMARY:
==2702==     in use at exit: 122,968 bytes in 9 blocks
==2702==   total heap usage: 31,541 allocs, 31,532 frees, 4,561,675 bytes allocated
==2702==
==2702== LEAK SUMMARY:
==2702==    definitely lost: 0 bytes in 0 blocks
==2702==    indirectly lost: 0 bytes in 0 blocks
==2702==      possibly lost: 0 bytes in 0 blocks
==2702==    still reachable: 122,968 bytes in 9 blocks
==2702==         suppressed: 0 bytes in 0 blocks

CommonTests*

==2774== HEAP SUMMARY:
==2774==     in use at exit: 88 bytes in 3 blocks
==2774==   total heap usage: 2,197 allocs, 2,194 frees, 506,303 bytes allocated
==2774==
==2774== LEAK SUMMARY:
==2774==    definitely lost: 0 bytes in 0 blocks
==2774==    indirectly lost: 0 bytes in 0 blocks
==2774==      possibly lost: 0 bytes in 0 blocks
==2774==    still reachable: 88 bytes in 3 blocks
==2774==         suppressed: 0 bytes in 0 blocks

CreateIndexTest*

==2798==
==2798== HEAP SUMMARY:
==2798==     in use at exit: 296 bytes in 4 blocks
==2798==   total heap usage: 2,351 allocs, 2,347 frees, 531,825 bytes allocated
==2798==
==2798== LEAK SUMMARY:
==2798==    definitely lost: 0 bytes in 0 blocks
==2798==    indirectly lost: 0 bytes in 0 blocks
==2798==      possibly lost: 0 bytes in 0 blocks
==2798==    still reachable: 296 bytes in 4 blocks
==2798==         suppressed: 0 bytes in 0 blocks

CreateBinaryIndexTest*

==2808== HEAP SUMMARY:
==2808==     in use at exit: 296 bytes in 4 blocks
==2808==   total heap usage: 2,358 allocs, 2,354 frees, 536,572 bytes allocated
==2808==
==2808== LEAK SUMMARY:
==2808==    definitely lost: 0 bytes in 0 blocks
==2808==    indirectly lost: 0 bytes in 0 blocks
==2808==      possibly lost: 0 bytes in 0 blocks
==2808==    still reachable: 296 bytes in 4 blocks
==2808==         suppressed: 0 bytes in 0 blocks

IDGrouperBitMapTest*

==2818== HEAP SUMMARY:
==2818==     in use at exit: 8 bytes in 1 blocks
==2818==   total heap usage: 1,755 allocs, 1,754 frees, 453,533 bytes allocated
==2818==
==2818== LEAK SUMMARY:
==2818==    definitely lost: 0 bytes in 0 blocks
==2818==    indirectly lost: 0 bytes in 0 blocks
==2818==      possibly lost: 0 bytes in 0 blocks
==2818==    still reachable: 8 bytes in 1 blocks
==2818==         suppressed: 0 bytes in 0 blocks

Faiss*

==2828== LEAK SUMMARY:
==2828==    definitely lost: 1,184 bytes in 4 blocks
==2828==    indirectly lost: 262,880 bytes in 12 blocks
==2828==      possibly lost: 3,696 bytes in 11 blocks
==2828==    still reachable: 4,528 bytes in 7 blocks

FaissCreateIndexTest*

==2851== LEAK SUMMARY:
==2851==    definitely lost: 0 bytes in 0 blocks
==2851==    indirectly lost: 0 bytes in 0 blocks
==2851==      possibly lost: 0 bytes in 0 blocks
==2851==    still reachable: 88 bytes in 3 blocks
==2851==         suppressed: 0 bytes in 0 blocks

FaissCreateBinaryIndexTest*

==2865== HEAP SUMMARY:
==2865==     in use at exit: 88 bytes in 3 blocks
==2865==   total heap usage: 2,317 allocs, 2,314 frees, 532,634 bytes allocated
==2865==
==2865== LEAK SUMMARY:
==2865==    definitely lost: 0 bytes in 0 blocks
==2865==    indirectly lost: 0 bytes in 0 blocks
==2865==      possibly lost: 0 bytes in 0 blocks
==2865==    still reachable: 88 bytes in 3 blocks
==2865==         suppressed: 0 bytes in 0 blocks

FaissCreateIndexFromTemplateTest

==2875== HEAP SUMMARY:
==2875==     in use at exit: 280 bytes in 4 blocks
==2875==   total heap usage: 22,094 allocs, 22,090 frees, 1,211,747 bytes allocated
==2875==
==2875== LEAK SUMMARY:
==2875==    definitely lost: 0 bytes in 0 blocks
==2875==    indirectly lost: 0 bytes in 0 blocks
==2875==      possibly lost: 0 bytes in 0 blocks
==2875==    still reachable: 280 bytes in 4 blocks
==2875==         suppressed: 0 bytes in 0 blocks

FaissLoad*

==2926== HEAP SUMMARY:
==2926==     in use at exit: 8,016 bytes in 17 blocks
==2926==   total heap usage: 75,376 allocs, 75,359 frees, 1,044,926,065 bytes allocated
==2926==
==2926== 3,696 bytes in 11 blocks are possibly lost in loss record 6 of 7
==2926==    at 0x4C40963: calloc (vg_replace_malloc.c:1595)
==2926==    by 0x4015522: UnknownInlinedFun (rtld-malloc.h:44)
==2926==    by 0x4015522: allocate_dtv (dl-tls.c:372)
==2926==    by 0x4015F51: _dl_allocate_tls (dl-tls.c:630)
==2926==    by 0x790CE32: allocate_stack (allocatestack.c:623)
==2926==    by 0x790CE32: pthread_create@@GLIBC_2.2.5 (pthread_create.c:662)
==2926==    by 0x7D43AF2: ??? (in /usr/lib64/libgomp.so.1.0.0)
==2926==    by 0x7D39720: GOMP_parallel (in /usr/lib64/libgomp.so.1.0.0)
==2926==    by 0xAD408C: void faiss::(anonymous namespace)::exhaustive_L2sqr_blas_default_impl<faiss::Top1BlockResultHandler<faiss::CMax<float, long> > >(float const*, float const*, unsigned long, unsigned long, unsigned long, faiss::Top1BlockResultHandler<faiss::CMax<float, long> >&, float const*) (distances.cpp:321)
==2926==    by 0xAD286B: void faiss::(anonymous namespace)::exhaustive_L2sqr_blas<faiss::Top1BlockResultHandler<faiss::CMax<float, long> > >(float const*, float const*, unsigned long, unsigned long, unsigned long, faiss::Top1BlockResultHandler<faiss::CMax<float, long> >&, float const*) (distances.cpp:599)
==2926==    by 0xAD449B: void faiss::(anonymous namespace)::knn_L2sqr_select<faiss::Top1BlockResultHandler<faiss::CMax<float, long> > >(float const*, float const*, unsigned long, unsigned long, unsigned long, faiss::Top1BlockResultHandler<faiss::CMax<float, long> >&, float const*, faiss::IDSelector const*) (distances.cpp:620)
==2926==    by 0xAD2FD6: faiss::knn_L2sqr(float const*, float const*, unsigned long, unsigned long, unsigned long, unsigned long, float*, long*, float const*, faiss::IDSelector const*) (distances.cpp:735)
==2926==    by 0xAD332F: faiss::knn_L2sqr(float const*, float const*, unsigned long, unsigned long, unsigned long, faiss::HeapArray<faiss::CMax<float, long> >*, float const*, faiss::IDSelector const*) (distances.cpp:762)
==2926==    by 0x9509CD: faiss::IndexFlat::search(long, float const*, long, float*, long*, faiss::SearchParameters const*) const (IndexFlat.cpp:43)
==2926==
==2926== LEAK SUMMARY:
==2926==    definitely lost: 0 bytes in 0 blocks
==2926==    indirectly lost: 0 bytes in 0 blocks
==2926==      possibly lost: 3,696 bytes in 11 blocks
==2926==    still reachable: 4,320 bytes in 6 blocks
==2926==         suppressed: 0 bytes in 0 blocks

FaissQuery*

==2916== HEAP SUMMARY:
==2916==     in use at exit: 488 bytes in 5 blocks
==2916==   total heap usage: 147,973 allocs, 147,968 frees, 10,422,555 bytes allocated
==2916==
==2916== LEAK SUMMARY:
==2916==    definitely lost: 0 bytes in 0 blocks
==2916==    indirectly lost: 0 bytes in 0 blocks
==2916==      possibly lost: 0 bytes in 0 blocks
==2916==    still reachable: 488 bytes in 5 blocks
==2916==         suppressed: 0 bytes in 0 blocks

FaissFreee*

==2947== HEAP SUMMARY:
==2947==     in use at exit: 8 bytes in 1 blocks
==2947==   total heap usage: 19,099 allocs, 19,098 frees, 666,886 bytes allocated
==2947==
==2947== LEAK SUMMARY:
==2947==    definitely lost: 0 bytes in 0 blocks
==2947==    indirectly lost: 0 bytes in 0 blocks
==2947==      possibly lost: 0 bytes in 0 blocks
==2947==    still reachable: 8 bytes in 1 blocks
==2947==         suppressed: 0 bytes in 0 blocks

FaissInit*

==2961== HEAP SUMMARY:
==2961==     in use at exit: 8,016 bytes in 17 blocks
==2961==   total heap usage: 23,239 allocs, 23,222 frees, 621,777,685 bytes allocated
==2961==
==2961== 3,696 bytes in 11 blocks are possibly lost in loss record 6 of 7
==2961==    at 0x4C40963: calloc (vg_replace_malloc.c:1595)
==2961==    by 0x4015522: UnknownInlinedFun (rtld-malloc.h:44)
==2961==    by 0x4015522: allocate_dtv (dl-tls.c:372)
==2961==    by 0x4015F51: _dl_allocate_tls (dl-tls.c:630)
==2961==    by 0x790CE32: allocate_stack (allocatestack.c:623)
==2961==    by 0x790CE32: pthread_create@@GLIBC_2.2.5 (pthread_create.c:662)
==2961==    by 0x7D43AF2: ??? (in /usr/lib64/libgomp.so.1.0.0)
==2961==    by 0x7D39720: GOMP_parallel (in /usr/lib64/libgomp.so.1.0.0)
==2961==    by 0xAD408C: void faiss::(anonymous namespace)::exhaustive_L2sqr_blas_default_impl<faiss::Top1BlockResultHandler<faiss::CMax<float, long> > >(float const*, float const*, unsigned long, unsigned long, unsigned long, faiss::Top1BlockResultHandler<faiss::CMax<float, long> >&, float const*) (distances.cpp:321)
==2961==    by 0xAD286B: void faiss::(anonymous namespace)::exhaustive_L2sqr_blas<faiss::Top1BlockResultHandler<faiss::CMax<float, long> > >(float const*, float const*, unsigned long, unsigned long, unsigned long, faiss::Top1BlockResultHandler<faiss::CMax<float, long> >&, float const*) (distances.cpp:599)
==2961==    by 0xAD449B: void faiss::(anonymous namespace)::knn_L2sqr_select<faiss::Top1BlockResultHandler<faiss::CMax<float, long> > >(float const*, float const*, unsigned long, unsigned long, unsigned long, faiss::Top1BlockResultHandler<faiss::CMax<float, long> >&, float const*, faiss::IDSelector const*) (distances.cpp:620)
==2961==    by 0xAD2FD6: faiss::knn_L2sqr(float const*, float const*, unsigned long, unsigned long, unsigned long, unsigned long, float*, long*, float const*, faiss::IDSelector const*) (distances.cpp:735)
==2961==    by 0xAD332F: faiss::knn_L2sqr(float const*, float const*, unsigned long, unsigned long, unsigned long, faiss::HeapArray<faiss::CMax<float, long> >*, float const*, faiss::IDSelector const*) (distances.cpp:762)
==2961==    by 0x9509CD: faiss::IndexFlat::search(long, float const*, long, float*, long*, faiss::SearchParameters const*) const (IndexFlat.cpp:43)
==2961==
==2961== LEAK SUMMARY:
==2961==    definitely lost: 0 bytes in 0 blocks
==2961==    indirectly lost: 0 bytes in 0 blocks
==2961==      possibly lost: 3,696 bytes in 11 blocks
==2961==    still reachable: 4,320 bytes in 6 blocks
==2961==         suppressed: 0 bytes in 0 blocks

FaissTrain*

==2982== HEAP SUMMARY:
==2982==     in use at exit: 8,016 bytes in 17 blocks
==2982==   total heap usage: 21,351 allocs, 21,334 frees, 168,558,659 bytes allocated
==2982==
==2982== 3,696 bytes in 11 blocks are possibly lost in loss record 6 of 7
==2982==    at 0x4C40963: calloc (vg_replace_malloc.c:1595)
==2982==    by 0x4015522: UnknownInlinedFun (rtld-malloc.h:44)
==2982==    by 0x4015522: allocate_dtv (dl-tls.c:372)
==2982==    by 0x4015F51: _dl_allocate_tls (dl-tls.c:630)
==2982==    by 0x790CE32: allocate_stack (allocatestack.c:623)
==2982==    by 0x790CE32: pthread_create@@GLIBC_2.2.5 (pthread_create.c:662)
==2982==    by 0x7D43AF2: ??? (in /usr/lib64/libgomp.so.1.0.0)
==2982==    by 0x7D39720: GOMP_parallel (in /usr/lib64/libgomp.so.1.0.0)
==2982==    by 0xAD408C: void faiss::(anonymous namespace)::exhaustive_L2sqr_blas_default_impl<faiss::Top1BlockResultHandler<faiss::CMax<float, long> > >(float const*, float const*, unsigned long, unsigned long, unsigned long, faiss::Top1BlockResultHandler<faiss::CMax<float, long> >&, float const*) (distances.cpp:321)
==2982==    by 0xAD286B: void faiss::(anonymous namespace)::exhaustive_L2sqr_blas<faiss::Top1BlockResultHandler<faiss::CMax<float, long> > >(float const*, float const*, unsigned long, unsigned long, unsigned long, faiss::Top1BlockResultHandler<faiss::CMax<float, long> >&, float const*) (distances.cpp:599)
==2982==    by 0xAD449B: void faiss::(anonymous namespace)::knn_L2sqr_select<faiss::Top1BlockResultHandler<faiss::CMax<float, long> > >(float const*, float const*, unsigned long, unsigned long, unsigned long, faiss::Top1BlockResultHandler<faiss::CMax<float, long> >&, float const*, faiss::IDSelector const*) (distances.cpp:620)
==2982==    by 0xAD2FD6: faiss::knn_L2sqr(float const*, float const*, unsigned long, unsigned long, unsigned long, unsigned long, float*, long*, float const*, faiss::IDSelector const*) (distances.cpp:735)
==2982==    by 0xAD332F: faiss::knn_L2sqr(float const*, float const*, unsigned long, unsigned long, unsigned long, faiss::HeapArray<faiss::CMax<float, long> >*, float const*, faiss::IDSelector const*) (distances.cpp:762)
==2982==    by 0x9509CD: faiss::IndexFlat::search(long, float const*, long, float*, long*, faiss::SearchParameters const*) const (IndexFlat.cpp:43)
==2982==
==2982== LEAK SUMMARY:
==2982==    definitely lost: 0 bytes in 0 blocks
==2982==    indirectly lost: 0 bytes in 0 blocks
==2982==      possibly lost: 3,696 bytes in 11 blocks
==2982==    still reachable: 4,320 bytes in 6 blocks
==2982==         suppressed: 0 bytes in 0 blocks

FaissCreate*

==3003== HEAP SUMMARY:
==3003==     in use at exit: 8,016 bytes in 17 blocks
==3003==   total heap usage: 47,049 allocs, 47,032 frees, 2,679,663 bytes allocated
==3003==
==3003== 3,696 bytes in 11 blocks are possibly lost in loss record 6 of 7
==3003==    at 0x4C40963: calloc (vg_replace_malloc.c:1595)
==3003==    by 0x4015522: UnknownInlinedFun (rtld-malloc.h:44)
==3003==    by 0x4015522: allocate_dtv (dl-tls.c:372)
==3003==    by 0x4015F51: _dl_allocate_tls (dl-tls.c:630)
==3003==    by 0x790CE32: allocate_stack (allocatestack.c:623)
==3003==    by 0x790CE32: pthread_create@@GLIBC_2.2.5 (pthread_create.c:662)
==3003==    by 0x7D43AF2: ??? (in /usr/lib64/libgomp.so.1.0.0)
==3003==    by 0x7D39720: GOMP_parallel (in /usr/lib64/libgomp.so.1.0.0)
==3003==    by 0xA463FA: faiss::ScalarQuantizer::compute_codes(float const*, unsigned char*, unsigned long) const (ScalarQuantizer.cpp:1650)
==3003==    by 0x9EED43: faiss::IndexScalarQuantizer::sa_encode(long, float const*, unsigned char*) const (IndexScalarQuantizer.cpp:107)
==3003==    by 0x953A3C: faiss::IndexFlatCodes::add(long, float const*) (IndexFlatCodes.cpp:29)
==3003==    by 0x955E19: faiss::IndexHNSW::add(long, float const*) (IndexHNSW.cpp:368)
==3003==    by 0x9662A0: faiss::IndexIDMapTemplate<faiss::Index>::add_with_ids(long, float const*, long const*) (IndexIDMap.cpp:80)
==3003==    by 0x563AF5E: knn_jni::faiss_wrapper::IndexService::createIndex(knn_jni::JNIUtilInterface*, JNIEnv_*, faiss::MetricType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, int, int, long, std::vector<long, std::allocator<long> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, _jobject*, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, _jobject*> > >) (faiss_index_service.cpp:103)
==3003==
==3003== LEAK SUMMARY:
==3003==    definitely lost: 0 bytes in 0 blocks
==3003==    indirectly lost: 0 bytes in 0 blocks
==3003==      possibly lost: 3,696 bytes in 11 blocks
==3003==    still reachable: 4,320 bytes in 6 blocks
==3003==         suppressed: 0 bytes in 0 blocks

FaissIsShared*

==3024== HEAP SUMMARY:
==3024==     in use at exit: 264,072 bytes in 17 blocks
==3024==   total heap usage: 1,788 allocs, 1,771 frees, 986,637 bytes allocated
==3024==
==3024== 72 bytes in 1 blocks are definitely lost in loss record 6 of 17
==3024==    at 0x4C39913: operator new(unsigned long) (vg_replace_malloc.c:483)
==3024==    by 0x808205: FaissIsSharedIndexStateRequired_BasicAssertions_Test::TestBody() (faiss_wrapper_test.cpp:697)
==3024==    by 0xF75F6A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2638)
==3024==    by 0xF702D2: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2674)
==3024==    by 0xF50117: testing::Test::Run() (gtest.cc:2713)
==3024==    by 0xF50A26: testing::TestInfo::Run() (gtest.cc:2859)
==3024==    by 0xF51298: testing::TestSuite::Run() (gtest.cc:3037)
==3024==    by 0xF605F2: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5967)
==3024==    by 0xF77038: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2638)
==3024==    by 0xF711D2: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2674)
==3024==    by 0xF5EF2E: testing::UnitTest::Run() (gtest.cc:5546)
==3024==    by 0x94D974: RUN_ALL_TESTS() (gtest.h:2334)
==3024==
==3024== 72 bytes in 1 blocks are definitely lost in loss record 7 of 17
==3024==    at 0x4C39913: operator new(unsigned long) (vg_replace_malloc.c:483)
==3024==    by 0x80828E: FaissIsSharedIndexStateRequired_BasicAssertions_Test::TestBody() (faiss_wrapper_test.cpp:705)
==3024==    by 0xF75F6A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2638)
==3024==    by 0xF702D2: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2674)
==3024==    by 0xF50117: testing::Test::Run() (gtest.cc:2713)
==3024==    by 0xF50A26: testing::TestInfo::Run() (gtest.cc:2859)
==3024==    by 0xF51298: testing::TestSuite::Run() (gtest.cc:3037)
==3024==    by 0xF605F2: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5967)
==3024==    by 0xF77038: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2638)
==3024==    by 0xF711D2: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2674)
==3024==    by 0xF5EF2E: testing::UnitTest::Run() (gtest.cc:5546)
==3024==    by 0x94D974: RUN_ALL_TESTS() (gtest.h:2334)
==3024==
==3024== 131,960 (520 direct, 131,440 indirect) bytes in 1 blocks are definitely lost in loss record 16 of 17
==3024==    at 0x4C39913: operator new(unsigned long) (vg_replace_malloc.c:483)
==3024==    by 0x80833D: FaissIsSharedIndexStateRequired_BasicAssertions_Test::TestBody() (faiss_wrapper_test.cpp:720)
==3024==    by 0xF75F6A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2638)
==3024==    by 0xF702D2: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2674)
==3024==    by 0xF50117: testing::Test::Run() (gtest.cc:2713)
==3024==    by 0xF50A26: testing::TestInfo::Run() (gtest.cc:2859)
==3024==    by 0xF51298: testing::TestSuite::Run() (gtest.cc:3037)
==3024==    by 0xF605F2: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5967)
==3024==    by 0xF77038: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2638)
==3024==    by 0xF711D2: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2674)
==3024==    by 0xF5EF2E: testing::UnitTest::Run() (gtest.cc:5546)
==3024==    by 0x94D974: RUN_ALL_TESTS() (gtest.h:2334)
==3024==
==3024== 131,960 (520 direct, 131,440 indirect) bytes in 1 blocks are definitely lost in loss record 17 of 17
==3024==    at 0x4C39913: operator new(unsigned long) (vg_replace_malloc.c:483)
==3024==    by 0x8083E6: FaissIsSharedIndexStateRequired_BasicAssertions_Test::TestBody() (faiss_wrapper_test.cpp:730)
==3024==    by 0xF75F6A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2638)
==3024==    by 0xF702D2: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2674)
==3024==    by 0xF50117: testing::Test::Run() (gtest.cc:2713)
==3024==    by 0xF50A26: testing::TestInfo::Run() (gtest.cc:2859)
==3024==    by 0xF51298: testing::TestSuite::Run() (gtest.cc:3037)
==3024==    by 0xF605F2: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5967)
==3024==    by 0xF77038: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2638)
==3024==    by 0xF711D2: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2674)
==3024==    by 0xF5EF2E: testing::UnitTest::Run() (gtest.cc:5546)
==3024==    by 0x94D974: RUN_ALL_TESTS() (gtest.h:2334)
==3024== LEAK SUMMARY:
==3024==    definitely lost: 1,184 bytes in 4 blocks
==3024==    indirectly lost: 262,880 bytes in 12 blocks
==3024==      possibly lost: 0 bytes in 0 blocks
==3024==    still reachable: 8 bytes in 1 blocks
==3024==         suppressed: 0 bytes in 0 blocks

FaissRange*

==3034== HEAP SUMMARY:
==3034==     in use at exit: 8,016 bytes in 17 blocks
==3034==   total heap usage: 253,623 allocs, 253,606 frees, 961,470,489 bytes allocated
==3034==
==3034== 3,696 bytes in 11 blocks are possibly lost in loss record 6 of 7
==3034==    at 0x4C40963: calloc (vg_replace_malloc.c:1595)
==3034==    by 0x4015522: UnknownInlinedFun (rtld-malloc.h:44)
==3034==    by 0x4015522: allocate_dtv (dl-tls.c:372)
==3034==    by 0x4015F51: _dl_allocate_tls (dl-tls.c:630)
==3034==    by 0x790CE32: allocate_stack (allocatestack.c:623)
==3034==    by 0x790CE32: pthread_create@@GLIBC_2.2.5 (pthread_create.c:662)
==3034==    by 0x7D43AF2: ??? (in /usr/lib64/libgomp.so.1.0.0)
==3034==    by 0x7D39720: GOMP_parallel (in /usr/lib64/libgomp.so.1.0.0)
==3034==    by 0x95516F: faiss::(anonymous namespace)::hnsw_add_vertices(faiss::IndexHNSW&, unsigned long, unsigned long, float const*, bool, bool) (IndexHNSW.cpp:165)
==3034==    by 0x955E7D: faiss::IndexHNSW::add(long, float const*) (IndexHNSW.cpp:371)
==3034==    by 0x9662A0: faiss::IndexIDMapTemplate<faiss::Index>::add_with_ids(long, float const*, long const*) (IndexIDMap.cpp:80)
==3034==    by 0x8E18B9: test_util::FaissAddData(faiss::Index*, std::vector<long, std::allocator<long> >, std::vector<float, std::allocator<float> >) (test_util.cpp:301)
==3034==    by 0x809F72: FaissRangeSearchQueryIndexTest_BasicAssertions_Test::TestBody() (faiss_wrapper_test.cpp:812)
==3034==    by 0xF75F6A: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2638)
==3034==
==3034== LEAK SUMMARY:
==3034==    definitely lost: 0 bytes in 0 blocks
==3034==    indirectly lost: 0 bytes in 0 blocks
==3034==      possibly lost: 3,696 bytes in 11 blocks
==3034==    still reachable: 4,320 bytes in 6 blocks
==3034==         suppressed: 0 bytes in 0 blocks

QueryIndexHNSWTests*

==3055== LEAK SUMMARY:
==3055==    definitely lost: 236 bytes in 13 blocks
==3055==    indirectly lost: 1,664 bytes in 88 blocks
==3055==      possibly lost: 0 bytes in 0 blocks
==3055==    still reachable: 455,533 bytes in 1,015 blocks
==3055==         suppressed: 0 bytes in 0 blocks

QueryIndex*

==3079== LEAK SUMMARY:
==3079==    definitely lost: 792 bytes in 42 blocks
==3079==    indirectly lost: 5,024 bytes in 268 blocks
==3079==      possibly lost: 0 bytes in 0 blocks
==3079==    still reachable: 123,176 bytes in 10 blocks
==3079==         suppressed: 0 bytes in 0 blocks
jmazanec15 commented 2 months ago

As update, I spent some time digging into this. Overall, I havent found any prod issues. The issue Im running into though is that mocks are interfering with leak check. I created #1822 to make this more maintainable.

Anyway, to run this, Im going to leave steps here:

docker run -u 0 -it opensearchstaging/ci-runner:ci-runner-almalinux8-opensearch-build-v1 /bin/bash
yum install gcc-toolset-11-gcc-gfortran -y
git clone https://github.com/opensearch-project/k-NN.git
cd k-NN
git submodule update --init -- jni/external/nmslib
git submodule update --init -- jni/external/faiss
cd jni
# Update test to depend on gfortran
cmake -Bbuild -DCMAKE_BUILD_TYPE=Debug -DCOMMIT_LIB_PATCHES=false -DSIMD_ENABLED=false .
make -Cbuild

valgrind --leak-check=full ./bin/jni_test --gtest_filter="CommonTests*"
valgrind ./bin/jni_test