Closed JamyDev closed 1 week ago
I've created a PR here with design docs. https://github.com/sourcegraph/scip/pull/289
That includes rationale on why we've avoided integer IDs as well as other kinds of redundancy (that would push more work onto indexer authors).
Please leave comments on the PR if you have follow-up questions.
We did get a request for SCIP to SQLite conversion. https://github.com/sourcegraph/scip/issues/233 -- We'd be open to brainstorming design and/or accepting support for that as part of the SCIP CLI if that's something you would find useful, but we don't have bandwidth to add support for that ourselves.
We've been playing around and working with the SCIP indices for quite a while now and one thing became clear, they take up a lot of space, especially considering the scale of our codebase.
One though was that monikers take up a large chunk of the space, where a lot of it is redundant information, eg: Every within
some/package/path
symbol has:scip-go gomod some/package/path v0.0.4
as a preamble, then as the actual descriptors we havesome/package/path/Struct#
which duplicates the package path provided in the preamble. In our usecase we only need the descriptors, and maybe the package version so a thought was to just strip the prefix from the indices. Before doing that I wanted to ask if any other considerations were made around the sizing of the indices.We considered compressing as these strings would be very compressible, but our index reader would still need to scan through the uncompressed index regardless.
Another option was to define symbol/moniker mappings for the index, which would map the moniker to a unique id so it may be reused, similar to how LSIF handles it.This could be an optional feature in the index definition either on the document or index level. This would likely also give a good indication of all the symbols the index references without having to read through the docs&occurrences.