google / zoekt

Fast trigram based code search
1.67k stars 113 forks source link

Make line reference format in url hash formattable #121

Closed robinp-tw closed 3 years ago

robinp-tw commented 3 years ago

Now Zoekt links using #l99, but github.com only eats #L99, while the example libreoffice gerrit repo (https://git.libreoffice.org/online/+/848145503bf7b98ce4a4aa0a858a0d71dd0dbb26/Makefile.am#93) seems to eat #99 only.

hanwen commented 3 years ago

I don't understand the bug report. Per

https://github.com/google/zoekt/blob/a75291ac202e088f8edf20ab57f724b49f2f4faf/gitindex/index.go#L122

the line fragment for github already uses capital L, and the Gitiles format is already as you describe,

https://github.com/google/zoekt/blob/a75291ac202e088f8edf20ab57f724b49f2f4faf/gitindex/index.go#L117

robinp-tw commented 3 years ago

Thanks Hanwen for the pointer - did some debug printing, seems like https://github.com/google/zoekt/blob/master/web/snippets.go#L116 always calls getTemplate with f.Repository, but the fragment map gets populated with what seems to be f.SubRepositoryName?

In my case, f.Repository is github.com/the-org, while f.SubRepositoryName is github.com/the-org/a-repo, and fragmentmap indeed has keys from latter ones.

If that looks right, then it is a bug (should call getTemplate with subrepo, if exists?). If doesn't look right, maybe I called the zoekt-repo-indexer with wrong parameters?

hanwen commented 3 years ago

github.com/the-org normally never is a repository, right? The REpository vs SubRepositoryName distinction is for submodules. Are you indexing submodules?

you can look at the data that controls this by looking at the .git/config file (it has the template settings, IIRC). You can also look at the end of the shard file (it should have a JSON with all metadata).

robinp-tw commented 3 years ago

Yeah, that's odd. Small repro:

$ ~/go/bin/zoekt-mirror-github -org google -name zoekt -dest repo

$ ~/go/bin/zoekt-repo-index -repo_cache repo -index idx -base_url https://github.com/google --rev_prefix=  some.xml
2020/10/19 20:49:19 finished idx/github.com%2Fgoogle_v15.00000.zoekt: 1779261 index bytes (overhead 3.5)

$ ~/go/bin/zoekt-webserver -index idx
2020/10/19 20:49:42 loading 1 shards
2020/10/19 20:49:42 serving HTTP on :6070

$ curl 'http://localhost:6070/search?q=search&num=0' | grep -oi 'go#l' | sort -u
go#l

I notice that it is the result.Files iterator f's f.Repository and f.SubRepositoryName that is odd (here github.com/google and github.com/google/zoekt, respectively). Where is that metadata coming from? Am I invoking the indexer with the right arguments? Thanks for bearing with me.

Edit: some.xml is:

<?xml version="1.0" encoding="UTF-8"?>
<manifest>
  <remote name="google" fetch="https://github.com/google/" />
  <default revision="HEAD" remote="google" sync-j="4" />
  <project remote="google" name="zoekt" />
</manifest>
hanwen commented 3 years ago

are you sure you want to use the repo indexer? Normally, you'd simply index the git repository directly, using zoekt-git-index.

robinp-tw commented 3 years ago

You are right! In retrospect, I don't know why I was using the repo-indexer. Thank you!