sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.11k stars 1.27k forks source link

insights: respect the escape sequences provided in the explicit `content` argument of a query #33160

Open northyg opened 2 years ago

northyg commented 2 years ago

Steps to reproduce:

  1. Create a new code insight
  2. Give it a query like content: "Color\\(" patterntype:regexp)
  3. The preview displays correctly repo:^(github\.com/sourcegraph/sourcegraph)$ content: "Color\\(")
  4. But on save the Code Insight query becomes content: "Color\(" patterntype:regexp)

Expected behavior:

  1. Code Insight query should not remove the extra regex \ upon saving.

Actual behavior:

  1. The extra \ is removed.
  2. Code Insight doesn't work properly
  3. Work around is to add extra \s to the query.

If would like immediate help on this, please email support@sourcegraph.com (you can still create the issue, but there are not SLAs on issues like there are for support requests).

sourcegraph-bot-2 commented 2 years ago

Heads up @joelkw @felixfbecker @vovakulikov @unclejustin - the "team/code-insights" label was applied to this issue.

coury-clark commented 2 years ago

I believe this is what is responsible: https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/enterprise/insights/pages/insights/creation/search-insight/utils/insight-sanitizer.ts?L11-15

@vovakulikov @unclejustin

There is a subtle difference in our search syntax if using content implicitly or explicitly. Explicitly we have to double quote our regular expressions, as the customer identified: content:"Color\\(" patternType:regexp Using an implicitly content field, we do not. I assume that the query parser is doing this escaping for us in this case. Color\( patternType:regexp

We should be careful about this generically removing these double quotes, we need to consider the semantics of where the content is located.

vovakulikov commented 2 years ago

@coury-clark @northyg I think before when we had just a simple and native HTML input for the search query field and we had this problem that you have to put escape before special symbols just because it makes sense from reg exp perspective and people get used to this, and as @coury-clark said above there is a difference between escaped and non-escaped sequences in a query from a search results point of view.

Bottom line I think we could just remove this FE sanitize check since Monaco should do this work internally. @unclejustin FYI.

unclejustin commented 2 years ago

Makes sense to me, let's just make sure to manually test the before and after before committing this. Bonus points if either of you can think of a simple automated test to confirm we don't break something else.