sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.12k stars 1.29k forks source link

Syntax highlighting for very large files incredibly slow #40686

Open limitedmage opened 2 years ago

limitedmage commented 2 years ago

Issue reported by https://github.com/sourcegraph/accounts/issues/8205 Network trace available at https://drive.google.com/file/d/1Mc899L8ckNAIQVac791xYKgD9LJAra_T/view?usp=sharing

Steps to reproduce:

  1. In site config, set "search.largeFiles": ["**/**"] so very large files are all indexed
  2. Run a search that returns many very large files, but only one chunk per result.
  3. Wait for results to load

Expected behavior:

Results load

Actual behavior:

Results load with blank boxes and take forever to populate, if ever. The network tab shows that syntax highlighting is pending for a very long time; it may eventually complete or time out.

limitedmage commented 2 years ago

From the customer, turning on enableFastResultLoading "does speed things up a bit, by an order of magnitude it looks like, but that takes it from 2 minute loads per graphql query to ~1.2", which is better but still unacceptable IMO. Maybe caching the syntax highlighted files could improve this, as this is mostly happening on the same files for this customer (large 50,000+ line .gitmodules files, one in each repo).

olafurpg commented 2 years ago

Since search results only render a small context from the file, would it make sense to tweak syntax-highlighter to optionally accept a list of lines to highlight +/- context? Syntect highlights files one line at a time so it should be fairly easy to tweak the core loop to skip lines that don't need highlighting.