sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.12k stars 1.29k forks source link

Ranking: introduce approximate file ranks #57950

Open jtibshirani opened 1 year ago

jtibshirani commented 1 year ago

In 5.0, we introduced a "file rank" signal inspired by PageRank, based on global number of references to symbols in the file. The computation requires precise code intel, which doesn't have wide adoption at customer sites.

Near term We should explore an approximate 'file rank' based on cheaper signals that are always available. It would capture the 'file importance' in a single number, using signals like

This is very similar to a signal we already have in Zoekt called "doc order", so the main work is in choosing the most relevant components, normalizing them properly, and giving them a much bigger weight in the final score.

Longer term The graph team is working to improve heuristic code navigation. Perhaps we could build on this work to compute a PageRank-like metric using a heuristic code graph.

/cc @sourcegraph/search-platform

keegancsmith commented 1 year ago

Do you think we should replace the precise file ranks with these scores? Should we update the file rank API in sourcegraph to just expose the doc order like scores?

This is very similar to a signal we already have in Zoekt called "doc order", so the main work is in choosing the most relevant components and giving them a much bigger weight in the final score.

Agreed doc-order contains a lot of great "tie breakers", but applies poorly across shards. We should probably translate some of that doc-order stuff into more direct impact on file scores.

jtibshirani commented 11 months ago

As part of this work, we should also look into cleaning up the existing code for precise file ranks, as a lot of it could be simplified or removed.