Duplicated nodes in symbol name trie

We store symbol names for scip documents in a table called codeintel_scip_symbol_names and in order to save space we encode them as a prefix-trie inside the database. When storing larger scip indices we break up the work into chunks. We then compute the optimal trie for each individual chunk and flush it to the aforementioned database table. This means parts of the trie that appear in multiple chunks get duplicated, as we don't analyze for overlaps across chunks.

There's a script to check the potential saving in #60703. For example the scip-go generated index on sourcegraph/sourcegraph ends up using 20% extra rows in the database.

sourcegraph / sourcegraph-public-snapshot

Duplicated nodes in symbol name trie #60704