Closed sqs closed 5 years ago
For 3.0 I am adjusting our search syntax a bit, I wonder if one of the things we should do is make regexes something you opt into. I can imagine "detecting" regexes may be confusing to know if we used a regex or not. So rather do something like re:myregex
or /myregex/
to opt into regexes (so you can do repo:/foo/
or file:/foo/
.)
The 3.0(-preview) release is the best time to do this because it is a breaking change.
Suppose a
and b
are any terms in a query.
message:
(for searching commit messages), and any other future keywords added that are searching over arbitrary prose or code. It does not include keywords that search over filenames, repository names, commit authors, etc.a b
becomes a query for matches of the regexp regexp.QuoteMeta("a") + ".*" + regexp.QuoteMeta("b")
.regexp:yes
, which makes the text search tokens be interpreted as regexps (so a.*b
, and if a
or b
instead had regexp special characters, those would NOT be escaped).regexp.Compile
succeed on it, and does it contain any of the chars *+?()[]{}|\w\b
"./
to opt into regexps in repo:
and file:
is problematic because /
already has a meaning there. So those will continue to always be interpreted as regexps.Future work (out of scope for the initial change):
regexp:yes
.@sqs the new query parser is at pkg/search/query
which will be the one used in 3.0(-preview)
@keegancsmith :+1: What do you think of the proposal above?
I like the idea of text search tokens, especially if we consistently apply it.
With the new query syntax I think rather than having a.*b
we leave it to the query to decide that. IE it means content:a AND content:b
which will returns all matches for a or b, but only in documents which have both a and b. If someone wants the more specific a.*b
they can specify that.
The "looks like a regexp" is a bit harder to say with hierarchical queries. What if some atoms look like regex but others don't? Right now I assume it will work like case
which applies to everything in the subtree, so we could just have a top level regexp:yes
which means apply it to everyone.
Maybe we could do what the gcloud tool does? :
can mean substring, but then ~
can mean regex. But that still doesn't solve the case where the user just writes the query without a message:
prefix/etc. See the Operator Terms section in https://cloud.google.com/sdk/gcloud/reference/topic/filters
What is the decision here for 3.0?
Deferring this because #2125 will solve 80% of the problem with 20% of the effort.
We will not do this because it would mean if Alice shared a URL like https://sourcegraph.com/search?q=asdf with Bob, they would see very different results (because the query interpretation would depend on their user settings). It is OK for us to personalize results in the future, but I think the query interpretation should not be different among users.
Sourcegraph search query terms are interpreted as regexps. You need to quote literal strings. Many users have requested that Sourcegraph supports literal strings by default, or at least makes it easier to use literal strings. This is because they often search for things with
(
,.
, etc., and they want literal matching.Related: #465