sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.11k stars 1.29k forks source link

Allow users to make their queries default to literal string #633

Closed sqs closed 5 years ago

sqs commented 6 years ago

Sourcegraph search query terms are interpreted as regexps. You need to quote literal strings. Many users have requested that Sourcegraph supports literal strings by default, or at least makes it easier to use literal strings. This is because they often search for things with (, ., etc., and they want literal matching.

Related: #465

keegancsmith commented 5 years ago

For 3.0 I am adjusting our search syntax a bit, I wonder if one of the things we should do is make regexes something you opt into. I can imagine "detecting" regexes may be confusing to know if we used a regex or not. So rather do something like re:myregex or /myregex/ to opt into regexes (so you can do repo:/foo/ or file:/foo/.)

sqs commented 5 years ago

The 3.0(-preview) release is the best time to do this because it is a breaking change.

Proposal

Suppose a and b are any terms in a query.

Future work (out of scope for the initial change):

keegancsmith commented 5 years ago

@sqs the new query parser is at pkg/search/query which will be the one used in 3.0(-preview)

sqs commented 5 years ago

@keegancsmith :+1: What do you think of the proposal above?

keegancsmith commented 5 years ago

I like the idea of text search tokens, especially if we consistently apply it.

With the new query syntax I think rather than having a.*b we leave it to the query to decide that. IE it means content:a AND content:b which will returns all matches for a or b, but only in documents which have both a and b. If someone wants the more specific a.*b they can specify that.

The "looks like a regexp" is a bit harder to say with hierarchical queries. What if some atoms look like regex but others don't? Right now I assume it will work like case which applies to everything in the subtree, so we could just have a top level regexp:yes which means apply it to everyone.

Maybe we could do what the gcloud tool does? : can mean substring, but then ~ can mean regex. But that still doesn't solve the case where the user just writes the query without a message: prefix/etc. See the Operator Terms section in https://cloud.google.com/sdk/gcloud/reference/topic/filters

nicksnyder commented 5 years ago

What is the decision here for 3.0?

sqs commented 5 years ago

Deferring this because #2125 will solve 80% of the problem with 20% of the effort.

sqs commented 5 years ago

We will not do this because it would mean if Alice shared a URL like https://sourcegraph.com/search?q=asdf with Bob, they would see very different results (because the query interpretation would depend on their user settings). It is OK for us to personalize results in the future, but I think the query interpretation should not be different among users.