google / zoekt

Fast trigram based code search
1.67k stars 113 forks source link

Improve base document ranking within shards #5

Open hanwen opened 7 years ago

hanwen commented 7 years ago

Shorter file names are closer to the root, and are usually more important. More recently modified files are more important.

This has two aspects:

Suggestion: add a Rank (uint32)

https://github.com/google/zoekt/blob/master/indexbuilder.go#L124

on building the shard, reorder the documents using the ranking. Since the ordering goes into the search results,

https://github.com/google/zoekt/blob/master/eval.go#L601

that should already do something useful, but in order to compare results between shards, we should also store the rank in the index and use that instead of nextDoc/len(docs).

hanwen commented 7 years ago

once the infrastructure is in place, try it out with ranking based on filename length.

advanced feature: tweak the git indexer to look for the modification commit of the file, and use that timestamp as rank.

hanwen commented 7 years ago

wrong bug closed.

hanwen commented 6 years ago

the in-shard ordering is done, ranking across shards is not there yet.