sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.12k stars 1.29k forks source link

Git revision syntax and support is obscure and awkward #16628

Open rvantonder opened 3 years ago

rvantonder commented 3 years ago

While working on hover support for revision syntax, I was going over our documentation and git ref glob syntax (which is what we ultimately pass to the backend). Our documentation gives examples for searching over all branches using @*refs/heads/*. I always took this at face-value and assumed it is some git-specific syntax, and thought it would be nice if we developed our own revision syntax that simplifies life.

Turns out we've already developed our own revision syntax on top of git syntax, and do not make it explicit, anywhere, where the boundaries lie. It turns out that prepending * or *! to the beginning of a revision string demarcates a glob pattern that follows (the thing we will pass to --glob in the backend). In general this is a poor choice because * already has meaning in git syntax, so now in our revision language, the semantics of * is positionally dependent, and more significantly, stops us from adopting a more sensible convention (e.g., rev:* to search across all revisions) if we attempted to develop our own, more intuitive revision syntax.

The other piece of Sourcegraph revision syntax is the : separator, which is a bit more intuitive, but still problematic.

I think the current syntax is low usage (if users discover our docs, they probably blindly just accept the patterns we mention there and copy-paste it as I did before), but the functionality is high impact--it's really useful to get this right, and many times users want to know how to search or filter revisions. My proposal is:

With this proposal, all of the previous meaning is expressible and can be validated with a combination of rev, -rev, and or, simplifies passing values to the backend (no special postprocessing logic), and gets rid of special syntax that we need to document or handle in, e.g., query hovers.

Not prioritizing this right now, but CC @stefanhengl and @keegancsmith for comments.

/cc @sourcegraph/search-platform

github-actions[bot] commented 3 years ago

Heads up @lguychard - the "team/search" label was applied to this issue.

stefanhengl commented 3 years ago

I agree that the current syntax is confusing. If we want to keep the current expressiveness, however, we might not get away with just glob. Our syntax wraps 3 command line options (revision range, --glob, --exclude). If you specify revision ranges you "negate" (not really a negate, more like set difference on trees) with ^ and not with !*.

How would we separate !* from ^ with just using not rev?

rvantonder commented 3 years ago

If you specify revision ranges you "negate" (not really a negate, more like set difference on trees) with ^

What is the revision range syntax/option (is it Git, or something we do)? If it's something we do, care to link to the code so I can see what's going on here? From our docs, it sounds like ^ is only supported in commit search:

For commit and diff searches it is possible to exclude a set of commits by prepending a caret ^

Do you know if this the case?

stefanhengl commented 3 years ago

is it Git, or something we do

It is git (see https://git-scm.com/docs/gitrevisions#_revision_range_summary)

For commit and diff searches it is possible to exclude a set of commits by prepending a caret ^

Yes, this works e.g. repo:^github.com/sourcegraph/sourcegraph$@main:^3.21 type:commit shows all commits in main since 3.21.

Maybe we could introduce two keywords, rev and range? rev would do what it does right now but just support globbing, while range pulls out the syntax for diff/commit? The query above would like like this:

repo:^github\.com/sourcegraph/sourcegraph$ range:3.21...main type:commit
keegancsmith commented 3 years ago

I like the proposal of rev: being glob based, since that matches how our other keywords work as well as being inline with what I expect the common use case of commit search/etc. Revision ranges also seem like a common use case (given they are the default input for git log), but I would suspect for search use case it is less commonly used. Is there a way to combine them? eg detect special characters in the string causing a range to be interpreted instead? Or is that too tricky?