lanterndata / lantern

PostgreSQL vector database extension for building AI applications
https://lantern.dev
Other
755 stars 53 forks source link

add machinery to remove unecessary dynamic symbols at link time #222

Open ezra-varady opened 10 months ago

ezra-varady commented 10 months ago

This attempts to address #202. It adds a cmake target for an ld version script that places all symbols not explicitly marked with PGDLL_EXPORT in the local section. It finds the functions that need to be marked as global automatically so it shouldn't require developers to mark their functions in a suppression script. I believe that --version-script should behave the same way in lld though there isn't any documentation so this may only work with gnu tools. I would have preferred to set all visibility to hidden then manually mark functions to be exported in source since this is more transparent to developers, however libstdc++ doesn't allow this

example symbol exports with suppression script

nm -D --defined-only lantern.so
000000000000acf2 T cos_dist@@LANTERN
000000000000ad5a T hamming_dist@@LANTERN
000000000000a77e T hnsw_handler@@LANTERN
000000000000ac8a T l2sq_dist@@LANTERN
0000000000000000 A LANTERN
000000000000ae71 T lantern_internal_failure_point_enable@@LANTERN
000000000000ae25 T lantern_internal_validate_index@@LANTERN
000000000000ac66 T ldb_generic_dist@@LANTERN
000000000000ace5 T pg_finfo_cos_dist@@LANTERN
000000000000ad4d T pg_finfo_hamming_dist@@LANTERN
000000000000a771 T pg_finfo_hnsw_handler@@LANTERN
000000000000ac7d T pg_finfo_l2sq_dist@@LANTERN
000000000000ae64 T pg_finfo_lantern_internal_failure_point_enable@@LANTERN
000000000000ae18 T pg_finfo_lantern_internal_validate_index@@LANTERN
000000000000ac59 T pg_finfo_ldb_generic_dist@@LANTERN
000000000000adae T pg_finfo_vector_l2sq_dist@@LANTERN
00000000000101ab T _PG_fini@@LANTERN
000000000000ff6e T _PG_init@@LANTERN
0000000000009afa T Pg_magic_func@@LANTERN
000000000000adbb T vector_l2sq_dist@@LANTERN

EDIT: apparently Darwin uses its own linker so this isn't compatible with osx builds. Since this will be a relatively small percentage of deployments long term I think it's acceptable to just export more symbols on mac

EDIT2: sanitizer failure is unrelated to changes. It passed on the previous run, seems there's some minor non-determinism in the index that fails in that test

ezra-varady commented 9 months ago

It seems the reason for namespace std:: always being visible is that various C++ language features require this to work properly across shared library boundaries. Since lantern will never be linked against, and postgres, as well as the extension itself are both written in pure C, I believe for our specific use case it's safe to hide these symbols. I think the C++ components should have the correct symbols available when they're linked into usearch, and because usearch will, in this case, only ever be called by pure C, we can safely throw out the symbols. Nothing from inside usearch will ever interact with a C++ runtime outside of the library. The descriptions I've found are terse and ambiguous though. I'll try to figure out how to trigger an exception in usearch to sanity check this

I changed the version name to LANTERN_V1. Since nothing actually links against lantern, we shouldn't have to have multiple ABI versions, but in case we do this name will be safe