kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.42k stars 99 forks source link

Bug: Memory Leak in DuckDB Postgres Extension #4321

Open royi-luo opened 1 month ago

royi-luo commented 1 month ago

Kùzu version

master branch

What operating system are you using?

Ubuntu 22.04.5 LTS

What happened?

The DuckDB postgres extension can leak memory as it does not call PQclear on PGresult pointers in certain cases (such as whenever it is used to execute a COPY FROM). This can be caught in our CI when we run the test postgres~test~test_files~postgres.ScanPostgresTable with LeakSanitizer enabled. It can also be caught if we run the same test with Valgrind, below is the error message:

==401623== HEAP SUMMARY:
==401623==     in use at exit: 1,912,987 bytes in 24,775 blocks
==401623==   total heap usage: 2,868,613 allocs, 2,843,838 frees, 2,111,961,701 bytes allocated
==401623== 
==401623== 648 bytes in 3 blocks are definitely lost in loss record 1,036 of 1,225
==401623==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==401623==    by 0x26E280B3: PQmakeEmptyPGresult (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26E344EE: pqParseInput3 (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26E2B5A8: PQgetResult (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26DEA5FB: duckdb::PostgresConnection::ExecuteQueries(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26DD63CA: duckdb::PostgresTransaction::ExecuteQueries(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26DC0A04: duckdb::PostgresSchemaSet::LoadEntries(duckdb::ClientContext&) (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26DAA6B9: duckdb::PostgresCatalogSet::TryLoadEntries(duckdb::ClientContext&) (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26DAACFA: duckdb::PostgresCatalogSet::Scan(duckdb::ClientContext&, std::function<void (duckdb::CatalogEntry&)> const&) (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26DA86D3: duckdb::PostgresCatalog::ScanSchemas(duckdb::ClientContext&, std::function<void (duckdb::SchemaCatalogEntry&)>) (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x1D3FF082: duckdb::Catalog::GetAllSchemas(duckdb::ClientContext&) (in /home/kuzu/duckdb/build/src/libduckdb.so)
==401623==    by 0x1D35875F: duckdb::DuckDBTablesInit(duckdb::ClientContext&, duckdb::TableFunctionInitInput&) (in /home/kuzu/duckdb/build/src/libduckdb.so)
==401623== 
==401623== 6,792 (648 direct, 6,144 indirect) bytes in 3 blocks are definitely lost in loss record 1,186 of 1,225
==401623==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==401623==    by 0x26E280B3: PQmakeEmptyPGresult (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26E32C65: getCopyStart (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26E33FDC: pqParseInput3 (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26E2B5A8: PQgetResult (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26E2BBC2: PQexec (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26DE964A: duckdb::PostgresConnection::PQExecute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26DEBB24: duckdb::PostgresConnection::BeginCopyFrom(duckdb::PostgresBinaryReader&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26E0C00D: duckdb::PostgresLocalState::ScanChunk(duckdb::ClientContext&, duckdb::PostgresBindData const&, duckdb::PostgresGlobalState&, duckdb::DataChunk&) (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x26E0DFB3: duckdb::PostgresScan(duckdb::ClientContext&, duckdb::TableFunctionInput&, duckdb::DataChunk&) (in /home/kuzu/.duckdb/extensions/ff3d6071e0/linux_amd64/postgres_scanner.duckdb_extension)
==401623==    by 0x1DA08D1C: duckdb::PhysicalTableScan::GetData(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::OperatorSourceInput&) const (in /home/kuzu/duckdb/build/src/libduckdb.so)
==401623==    by 0x1DBD67EB: duckdb::PipelineExecutor::FetchFromSource(duckdb::DataChunk&) (in /home/kuzu/duckdb/build/src/libduckdb.so)
==401623== 
==401623== LEAK SUMMARY:
==401623==    definitely lost: 1,296 bytes in 6 blocks
==401623==    indirectly lost: 6,144 bytes in 3 blocks
==401623==      possibly lost: 0 bytes in 0 blocks
==401623==    still reachable: 1,894,651 bytes in 24,765 blocks
==401623==         suppressed: 10,896 bytes in 1 blocks
==401623== Reachable blocks (those to which a pointer was found) are not shown.
==401623== To see them, rerun with: --leak-check=full --show-leak-kinds=all

There is an issue for this open in duckdb's repo, we should keep track of this.

Are there known steps to reproduce?

Run the test postgres~test~test_files~postgres.ScanPostgresTable with LeakSanitizer enabled or with valgrind.

acquamarin commented 1 month ago

We use duckdb as an intermediate when sccanning from postgres. Duckdb leaks memory since it doesn't clean the PGResult. To fix the memory issue, we may have to change the duckdb's postgres extension.