Open jtibshirani opened 1 year ago
Do you think we should replace the precise file ranks with these scores? Should we update the file rank API in sourcegraph to just expose the doc order like scores?
This is very similar to a signal we already have in Zoekt called "doc order", so the main work is in choosing the most relevant components and giving them a much bigger weight in the final score.
Agreed doc-order contains a lot of great "tie breakers", but applies poorly across shards. We should probably translate some of that doc-order stuff into more direct impact on file scores.
As part of this work, we should also look into cleaning up the existing code for precise file ranks, as a lot of it could be simplified or removed.
In 5.0, we introduced a "file rank" signal inspired by PageRank, based on global number of references to symbols in the file. The computation requires precise code intel, which doesn't have wide adoption at customer sites.
Near term We should explore an approximate 'file rank' based on cheaper signals that are always available. It would capture the 'file importance' in a single number, using signals like
This is very similar to a signal we already have in Zoekt called "doc order", so the main work is in choosing the most relevant components, normalizing them properly, and giving them a much bigger weight in the final score.
Longer term The graph team is working to improve heuristic code navigation. Perhaps we could build on this work to compute a PageRank-like metric using a heuristic code graph.
/cc @sourcegraph/search-platform