sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

Inconsistent Results / Errors for Search Query Using repo: #27520

Closed alexAtSourcegraph closed 8 months ago

alexAtSourcegraph commented 2 years ago

Steps to reproduce:

1a. Execute the following search using the src-cli connected to Sourcegraph Cloud: src search -stream 'file:^LICENSE$ content:"MIT License" repo:contains.commit.after(November 1 2020) count:all'

1b. Execute the same search on sourcegraph.com using the Web UI

Expected behavior:

1a. A successful search with consistent results coming from the stdout

1b. A successful search with the maximum 1500 results displayed in the UI and the total actual result count displayed as well

Actual behavior:

1a. Inconsistent result counts measured by using the wc -l command (Lines do not equal number of results but the value should be consistent if using the same search query, no?). Results from this measure after four attempts were as follows: 695, 4425, 4827, 36300 (ommited comma in number for clarity). The final result(36,300) errored out with the following message error during decoding: stream error: stream ID 1; INTERNAL_ERROR.

1b.The search returned the following error Repo search failed: git command [git rev-parse HEAD] failed (stderr: ""): Post "http://gitserver-8.gitserver:3178/exec": dial tcp: lookup gitserver-8.gitserver on 10.165.0.10:53: no such host (and 20 more)

cdolfi commented 2 years ago

following

cdolfi commented 2 years ago

Hi! Looking to see if there is any updates on this issue. Thanks!

github-actions[bot] commented 2 years ago

Heads up @jjeffwarner - the "team/search-core" label was applied to this issue.

keegancsmith commented 2 years ago

The problem here is repo:contains.commit.after(November 1 2020) filter. We don't have an efficient way to implement this (yet). It isn't in our near term to fix this performance problem at the scale of sourcegraph.com. If you remove that you should start getting results back ordered by repository rank (star count). Note: We will then only search repos in our "global index", which are repos with 4 or more stars. cc @tsenart

justdueck commented 2 years ago

Hey @keegancsmith just wanted to resurface this. Do you happen to have a ballpark of the scope of work required to improve the performance of repo:contains.commit.after() and where that would fall on the roadmap?

keegancsmith commented 2 years ago

We are starting to think about a what comes next / longer term roadmap. Commit search is near the top of the list of high impact work. However, it is non-trivial and when we do work on it we will likely want to solve a whole slew of complaints (and take a bit of time to do so).

Is there is a way to get an appetite of the importance of just improving contains.commit.after on cloud? For example there are shorter term things we can do:

  1. Filter down repos using zoekt first then do the slower checks against gitserver.
  2. Adjust the behaviour of contains.commit.after to only check the timestamp of latest default branch commit.
  3. Ensure that when doing repo:contains... on cloud we use the same heuristics to only search the global index.
  4. Ensure that when doing repo:contains... we only check for repos cloned on gitserver.

Some of the above ideas also make a more targetted index for this sort of query possible. cc @tsenart @rvantonder for thoughts.

tsenart commented 2 years ago

Yep agree with @keegancsmith there's likely a quick win here. I've proposed https://github.com/sourcegraph/sourcegraph/issues/28476 in the past which would be one solution to this. @justdueck: Could you compile a list of customers using this feature and share it internally? That would help with prioritisation. cc @eugeniaft

justdueck commented 2 years ago

@tsenart I'm not aware of any customers I work with directly who are impacted by this, but I'll raise it in #ce-internal and see if anyone else has customers using this.

I spoke with @cdolfi last week, and improving this feature would be a big win for them and the work they're doing at Red Hat's Open Source Program office.

stefanhengl commented 8 months ago

This issue has been inactive for a long time. To reopen the ticket, please let us know how to reproduce the issue on latest main. For feature requests, please let us know what is still missing.