Closed isker closed 1 year ago
Here's what a sourcegraph.com search result looks like (for a single file), as sent to the browser. This is a pretty ideal format that we have no access to as direct clients of zoekt.
{
"type": "content",
"path": "ui/src/main/java/org/adoptopenjdk/jitwatch/util/UserInterfaceUtil.java",
"pathMatches": [
{
"start": {
"offset": 48,
"line": 0,
"column": 48
},
"end": {
"offset": 65,
"line": 0,
"column": 65
}
}
],
"repositoryID": 52556,
"repository": "github.com/AdoptOpenJDK/jitwatch",
"repoStars": 2896,
"repoLastFetched": "2023-07-09T20:16:44.782207Z",
"branches": [
""
],
"commit": "53abd156464924de1ac68cb1e09afe60b0c474dc",
"hunks": null,
"chunkMatches": [
{
"content": "public final class UserInterfaceUtil",
"contentStart": {
"offset": 1109,
"line": 35,
"column": 0
},
"ranges": [
{
"start": {
"offset": 1128,
"line": 35,
"column": 19
},
"end": {
"offset": 1145,
"line": 35,
"column": 36
}
}
]
},
{
"content": "\tprivate UserInterfaceUtil()",
"contentStart": {
"offset": 1819,
"line": 58,
"column": 0
},
"ranges": [
{
"start": {
"offset": 1828,
"line": 58,
"column": 9
},
"end": {
"offset": 1845,
"line": 58,
"column": 26
}
}
]
},
{
"content": "\tprivate static final Logger logger = LoggerFactory.getLogger(UserInterfaceUtil.class);",
"contentStart": {
"offset": 1148,
"line": 37,
"column": 0
},
"ranges": [
{
"start": {
"offset": 1210,
"line": 37,
"column": 62
},
"end": {
"offset": 1227,
"line": 37,
"column": 79
}
}
]
},
{
"content": "\t\tUserInterfaceUtil.RESOURCE_FACTORY.setResources(ResourceBundle.getBundle(UserInterfaceUtil.RESOURCE_NAME, locale));",
"contentStart": {
"offset": 2285,
"line": 74,
"column": 0
},
"ranges": [
{
"start": {
"offset": 2287,
"line": 74,
"column": 2
},
"end": {
"offset": 2304,
"line": 74,
"column": 19
}
},
{
"start": {
"offset": 2360,
"line": 74,
"column": 75
},
"end": {
"offset": 2377,
"line": 74,
"column": 92
}
}
]
},
{
"content": "\t\tInputStream inputStream = UserInterfaceUtil.class.getResourceAsStream(path);",
"contentStart": {
"offset": 3857,
"line": 132,
"column": 0
},
"ranges": [
{
"start": {
"offset": 3885,
"line": 132,
"column": 28
},
"end": {
"offset": 3902,
"line": 132,
"column": 45
}
}
]
},
{
"content": "\t\tImage image = UserInterfaceUtil.IMAGE_CAMERA;",
"contentStart": {
"offset": 4345,
"line": 154,
"column": 0
},
"ranges": [
{
"start": {
"offset": 4361,
"line": 154,
"column": 16
},
"end": {
"offset": 4378,
"line": 154,
"column": 33
}
}
]
},
{
"content": "\t\tString styleSheet = UserInterfaceUtil.class.getResource(\"/style.css\").toExternalForm();",
"contentStart": {
"offset": 5848,
"line": 204,
"column": 0
},
"ranges": [
{
"start": {
"offset": 5870,
"line": 204,
"column": 22
},
"end": {
"offset": 5887,
"line": 204,
"column": 39
}
}
]
}
]
},
I will try to see about getting enhancements upstreamed to zoekt, at least for problem 1.
Judging this to be fixed by https://github.com/isker/neogrok/commit/9f169fbdf63b23b22cc811898eb0cc4f7a5d8297. With gzip compression, response sizes are quite acceptable on the demo app.
I still might lower the default MaxMatchDisplayCount from 500, but it's probably not necessary.
When a query produces a large number of matches, perf is quite poor, both in the DOM and in handling the enormous HTTP response from zoekt.
I have tested out a few different mitigation strategies locally, but they were all either ineffective (e.g. lazily parsing the base64 content from zoekt into renderable data, such that we don't parse data that never reaches the DOM) or were only effective some times (e.g. debouncing live search more aggressively with shorter queries - we can only apply partially effective heuristics, like query length, to parameterize any kind of debouncing, and the subjective snappiness of the site is harmed in the process).
Ultimately the problem is that zoekt imposes this poor performance upon neogrok, and neogrok can't make up for it sufficiently in all cases.
There are two big problems:
SearchResult
Content
is pointlessly wasteful. Zoekt returns binary content: when rendering JSON, golang serializes[]byte
s into base64 strings, and zoekt's match indices are also byte indices. (Bytes are an interesting choice, as zoekt categorically only indexes UTF-8 content; but JS strings aren't UTF-8 anyway, so we wouldn't necessarily be saved from this computation if the content and indices were provided in that encoding.) Converting this binary data into renderable text in browsers is quite expensive. Doing it on the nodejs backend is counterintuitively even slower, as the conversion pegs the CPU for a time and wrecks the server's ability to concurrently serve requests.Sourcegraph mitigates these problems by putting their own golang server between the browser and zoekt. This server both puts hard limits on match counts (500 by default), and serves "rendered" chunks (where data is provided in much more immediately usable line-by-line strings) to the browser. This is (empirically) a fine approach, when the client of zoekt is itself written in go, but less performant clients are left wanting.
I will try to see about getting enhancements upstreamed to zoekt, at least for problem 1. I suspect that upstreaming a fix to problem 1 (by adding a
MaxMatchDisplayCount
or something like that) will be more tractable in all regards than upstreaming a fix to problem 2, and will probably eliminate problem 2 as a real factor for us anyway.