sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.29k forks source link

Using a rules engine to display "Did you mean" search suggestions #24458

Closed rrhyne closed 3 years ago

rrhyne commented 3 years ago

Many initial queries are semantic. An example is "python timer". These queries return very few or no results and often lead users down a path of dissatisfaction. These queries could benefit from a rule-based engine which would suggest other queries to run. These other queries could be displayed in a 'did you mean' panel above search results.

Example queries:

"python timer"

This is the most clear ruleset on how to provide value. Two tokens no filters, one is in our list of supported languages. Suggestions become: lang:python timer and repo:python timer

"facebook react usestate"

Three tokens with no filters. Suggestions become repo:facebook react usestate patternType:regex

Research should be conducted into common user queries that fail to determine rules which may benefit users.

Example of how this might appear in the UI:

image

More information in RFC 409 WIP: Tracking Semantic Searches

Figma file

Subtasks

(added by @fkling)

benvenker commented 3 years ago

@rrhyne, I like this idea a lot. I wonder how expensive it would be to suggest that while the user is typing. Some variation of this idea: image

Or

image

We could either take the user directly to the search result set if they clicked on an item, or autocomplete the query language and let them continue writing–or provide an affordance for both.

cc @fkling, would love your thoughts as well.

rrhyne commented 3 years ago

"javascript timer function" could be a line of text I want to search (a correct search), or I could be new and not understand how to use it. For a correct search, I'd hate to add a bunch of UI poping up over results and other items.

This is why I like the "Did you mean" approach.

benvenker commented 3 years ago

Yeah, that makes sense to me. I could be biasing too hard toward "users are likely to make an invalid first search" and wanting to avoid that at all costs. I truthfully don't know yet if that's the case.

I was envisioning the above being more interactive, like "hey here are some things we think you mean," they click it, and then we populate it with the correct query syntax to sort of live-teach them how to use the query language.

An alternate position is that that's the whole problem and they shouldn't have to learn it at all(!), to which I'm sympathetic, of course.

I think I would feel better about the "Did you mean" approach if we could be really good at providing suggestions for what we thought you meant. I'm very much in the dark as of now on what is possible/practical there in terms of how to make progress on this relatively quickly.

For example, I'm not sure I'd necessarily be thrilled if I entered "javascript timer function" and what came back was "did you mean javascript.timer.function?". Your example of lang:python timer are for sure more user friendly than that. I just don't personally have a sense of the tradeoffs or complexity of interpreting the intent of user input relative to valid query language syntax.

What would you think is a good next step here? It sounds like I need a better understanding of some capabilities in general, and that we need more minds on this to establish a baseline of what can be done. I'm happy to go any direction from here as long as it's roughly forward!

rrhyne commented 3 years ago

I'd like to run with the UI as I proposed. The next step would be to determine which search problems we can solve with this.

{language} token token to lang:{language} token token is a good rule to start with. I've observed new users not understand that on their initial searches.

The next rule might be determining if any of the tokens is a popular org (facebook) or repository (react) with an intermediary query and suggesting a repo search if it is:

token1 token2 to repo:token2 token1

From there you would want to understand how users fail in their queries for common use cases and suggest more rules.

fkling commented 3 years ago

Just from a technical perspective, I think we are relatively limited how we can display completion results/suggestions by the Monaco editor. But I want to spend some time eventually to see what we can do there.

I think (and maybe I'm going to put this into an RFC) there are multiple "types" of query analysis we can do, and whose results will be used differently in the UI.

E.g. I absolutely think that we should provide more "adaptive" autocompletion, but maybe only for cases where we know that the provided query is invalid as is.

benvenker commented 3 years ago

@fkling thanks. I'll work to get some exploration time into our iterations as soon as possible.

@rrhyne that works. You and I have a sync scheduled on Friday, would that be a good time to talk more? Should we pick a new time and invite more team members?

benvenker commented 3 years ago

@fkling I think we're ready for you to start digging into this if you're ready and have bandwidth. Feel free to move into Next.

fkling commented 3 years ago

For my understanding, the "did you mean" feature should work exclusively on the query itself, regardless of whether results are returned or not? It would simplify things if it's "query only".

rrhyne commented 3 years ago

Another example of a search error we could suggest a fix for: incorrect quotations:

lang:php "class Vector "

In this case we would simply remove both sets of quotes and search for lang:php class Vector instead. This is a very common mistake users make because they are used to escaping in regex, grep and other tools.

Slack thread for more information: https://sourcegraph.slack.com/archives/C0W2E592M/p1632251544196600

mikeschinkel commented 3 years ago

@rrhyne — 

"In this case we would simply remove both sets of quotes and search for lang:php class Vector instead."

Except in my case that would not give me what I was looking for, since I was the user that resulted in you posting this comment.

In my case I explicitly wanted to look for a space after Vector because I was trying to omit results of VectorException, VectorIterator, VectorType_O, etc. I later figured out I could use regex to do it, but I was still unclear how I could say "find exactly '^\s*class Vector {?' because it's not clear how I can specify "search this, exactly."

As a user of Google I am an archetype for Jakob's Law of Internet User Experience and find your search engine's choice to looking literally for the quotes I used to be completely unintuitive and to violate the principle of least surprise.

jmtcw

rrhyne commented 3 years ago

@mikeschinkel, thanks for the clarification. At times we have users coming from grep or similar tools that are conditioned to quote to escape. I mistakenly thought this was the case.

I realize this is even less intuitive, but the content: filter provides the ability to escape this kind of query in literal mode. Ex: lang:PHP content:"class Vector". Does that do what you need it to?

There is also the ability to search for symbols, though I think that it would not work well for this query.

We certainly could have followed Google's approach, but we've found that the majority of queries against code do not need escaping. We have optimized for those use cases and feel it's the correct call for usability in the long run. We do recognize it has the side effect of increasing the learning curve in places.

Again, thanks for the feedback! Please keep it coming.

mikeschinkel commented 3 years ago

"I realize this is even less intuitive, but the content: filter provides the ability to escape this kind of query in literal mode. Ex: lang:PHP content:"class Vector". Does that do what you need it to?"

Yes it does. But you are also right that it is not intuitive. Or better, it's not easily discoverable.

"We certainly could have followed Google's approach, but we've found that the majority of queries against code do not need escaping. We have optimized for those use cases and feel it's the correct call for usability in the long run. "

I am not clear what you mean by "escaping?"

BTW, In my case I just wanted to specify exactly what I was looking for, and everywhere else I do that with double quotes around things I expect to match literally. Without the content: prefix, of course.

"We do recognize it has the side effect of increasing the learning curve in places."

I appreciate that, but I urge you not to discount the value of initial success experiences vs. leaving first timers frustrated.

IMO, including for me, initial success experience is equally as important as power-user features. If you don't ensure the former you don't ever get enough of the latter.

"Again, thanks for the feedback! Please keep it coming."

As you can see, I am accommodating that request! :-)

rrhyne commented 3 years ago

An additional example of a problem we can solve with this framework: No results for a context other than global.

This occurs fairly frequently among users of contexts.

fkling commented 3 years ago

I assume we want to collect metrics for "suggestion clicked"? Any specific metric name to use?

benvenker commented 3 years ago

I actually have a ticket for tracking a typeahead click/tab but makes sense to do this here.

Maybe “typeahead accepted” or “typeahead suggestion accepted”.

The latter is long but more descriptive, which I think I prefer for ease of finding when creating charts in our data tools.

On Fri, Oct 8, 2021 at 08:56 Felix Kling @.***> wrote:

I assume we want to collect metrics for "suggestion clicked"? Any specific metric name to use?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/sourcegraph/sourcegraph/issues/24458#issuecomment-938664166, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDEO6NU6LAJA7Q7KUAV42DUF32BNANCNFSM5DEB5DNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Ben Venker Product Manager at Sourcegraph https://about.sourcegraph.com | @sourcegraph https://twitter.com/sourcegraph https://sourcegraph.com

rrhyne commented 3 years ago

@benvenker I don't think “typeahead” fits this feature. I think of typeahead as suggestions displayed when you enter text. These particular suggestions are displayed after the search.

We should be tracking both the display and the interaction so we can see how often these items are available and used. This allows us to create a funnel and will help us better determine the success or failure of the feature. At the end of that funnel might be "File viewed' which could indicate a successful search.

I suggest these two events:

benvenker commented 3 years ago

Oh I see. I misunderstood what Felix was talking about. I agree.

On Fri, Oct 8, 2021 at 13:03 Rob Rhyne @.***> wrote:

@benvenker https://github.com/benvenker I don't think “typeahead” fits this feature. I think of typeahead as suggestions displayed when you enter text. These particular suggestions are displayed after the search.

We should be tracking both the display and the interaction so we can see how often these items are available and used. This allows us to create a funnel and will help us better determine the success or failure of the feature. At the end of that funnel might be "File viewed' which could indicate a successful search.

I suggest these two events:

  • DidYouMeanSuggestionDisplayed - fired when you display one of these panels
  • DidYouMeanSuggestionClicked - fired on click of one of the searches
    • property: {type: '{name of rule}'}

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/sourcegraph/sourcegraph/issues/24458#issuecomment-938998640, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDEO6PXYHG4WNJAWC7KRZLUF4W7RANCNFSM5DEB5DNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Ben Venker Product Manager at Sourcegraph https://about.sourcegraph.com | @sourcegraph https://twitter.com/sourcegraph https://sourcegraph.com