sourcegraph / cody

Type less, code more: Cody is an AI code assistant that uses advanced search and codebase context to help you write and fix code.
https://cody.dev
Apache License 2.0
2.5k stars 259 forks source link

bug: Extension always says downloading symf and bfg; never updates the search index #5000

Open ned-pcs opened 1 month ago

ned-pcs commented 1 month ago

Version

v1.26.7

Describe the bug

Every time I start up VS Code, I see notifications that Cody is downloading bfg and symf, and that it is updating the search index. However, these notifications never go away. Screenshot from 2024-07-24 11-40-43

Expected behavior

I expect the downloads to complete and the search index to be updated.

Additional context

Here's the output from Cody:

█ telemetry-v2: recordEvent: cody.extension/savedLogin 
█ AuthProvider:init:lastEndpoint: Token recovered from secretStorage https://sourcegraph.com/
█ GraphQLTelemetryExporter: evaluated export mode: 5.2.5+
█ openctx: OpenCtx is enabled in Cody
█ featureflag: refreshed
█ SymfRunner: unsafeEnsureIndex file:///home/ned/PCS/Pyrenees/Chihuahua-Firmware
█ LocalEmbeddingsController: constructor
█ symf: downloading symf from https://github.com/sourcegraph/symf/releases/download/v0.0.12/symf-x86_64-linux-musl.zip
█ ContextFiltersProvider: setContextFilters 
█ ChatManager:constructor: init has local embeddings controller
█ ChatPanelsManager:constructor: init
█ LocalEmbeddingsController: start
█ CodyEngineService: constructor
█ LocalEmbeddingsController: start
█ LocalEmbeddingsController: get status
█ SimpleChatPanelProvider: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"indeterminate","embeddingsAPIProvider":"sourcegraph"}]}]
█ RepoNameResolver:getCodebaseFromWorkspaceUri: error 
█ SimpleChatPanelProvider: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"indeterminate","embeddingsAPIProvider":"sourcegraph"},{"kind":"search","type":"local","state":"indexing"}]}]
█ CodyEngine: downloading bfg from https://github.com/sourcegraph/bfg/releases/download/v5.4.6040/bfg-linux-x64.zip
█ ClientConfigSingleton: refreshing configuration
█ ClientConfigSingleton: refreshed {"codyEnabled":true,"chatEnabled":true,"autoCompleteEnabled":true,"customCommandsEnabled":true,"attributionEnabled":false,"smartContextWindowEnabled":true,"modelsAPIEnabled":false}
█ CodyCompletionProvider:initialized: fireworks/deepseek-coder-7b
█ UpstreamHealth: Ping took 1068ms (Gateway: 978ms) 
█ SimpleChatPanelProvider: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"indeterminate","embeddingsAPIProvider":"sourcegraph"},{"kind":"search","type":"local","state":"indexing"}]}]
ned-pcs commented 1 month ago

I have tried uninstalling and reinstalling the extension.

jdorfman commented 1 month ago

Thanks @ned-pcs I will let the product team know now. How large is the repo (files and size)

RXminuS commented 1 month ago

Hey @ned-pcs sorry you're having these issues. If I see it correctly you're running on Linux right?

Could you have a look in the folder ~/.config/Code/User/workspaceStorage/sourcegraph.cody-ai and specifically have a look if there's any .lock files or folders in the symf or cody-engine directories? This is where VSCode will store these items it's trying to download.

If those lock files are there we definitely need to figure out the root cause but in that case for now you can safely delete them and resetart VSCode to see if that works. If they aren't well keep digging to figure out what's going on. Are you running a firewall by any chance?

ned-pcs commented 1 month ago

I'm on Ubuntu Linux 22.04 on an x86_64 system.

My ~/.config/Code/User/workspaceStorage directory does not include a sourcegraph.cody-ai directory. All of its directories have hex names. And none of those directories include a symf or cody-engine directory.

I did find those directories under ~/.config/Code/User/globalStorage/sourcegraph.cody-ai:

% ls -a $(find ~/.config/Code/ -name cody-engine)
.  ..  bfg.zip  cody-engine-5.4.6040-linux-x64.lock  cody-engine-5.4.6040-linux-x64.tmp
% ls -a $(find ~/.config/Code/ -name symf)       
.  ..  indexroot  symf-v0.0.10-x86_64-linux  symf-v0.0.12-x86_64-linux.lock  symf-v0.0.12-x86_64-linux.tmp

After deleting the .tmp and .lock files and re-starting VS Code, I still see the same issue.

RXminuS commented 1 month ago

Ok, that's great. You are correct that it's the globalStorage directory; I mixed it up when looking up the linux directory online (I'm on macOS). Thanks for taking the time to get those details regardless, it helps a lot!

The process for downloading looks like this:

  1. Check if the correct version exists
  2. Have processes race to make a .lock file so nobody else is downloading at the same time (we've eliminated that this is causing the issue ✅ )
  3. Download release from GitHub to .tmp dir
  4. Unpack the binary in the zip to a file named the same as the lockfile but without the .lock

So if we now first check if there's a zip file inside the .tmp dir we can see if there's a problem with the downloading or the unpacking. Could you check this?

If there are zip files in there we can verify they're the same as if manually downloaded from https://github.com/sourcegraph/bfg/releases/download/v5.4.6040/bfg-linux-x64.zip and https://github.com/sourcegraph/symf/releases/download/v0.0.12/symf-x86_64-linux-musl.zip

If they are, then we can verify if installing them manually resolves things. You can do this by unzipping the archive and copying the contents to the directory where the lockfiles are. Then you rename the binary to match the lockfile name (without .lock), remove the lockfile and tmp dir, and restarting VSCode. If that works then there's some problem in unpacking these archives.

In the meantime I'll make sure to add some better error handling / reporting on these downloads

RXminuS commented 1 month ago

I also just noticed in the download code https://sourcegraph.com/github.com/sourcegraph/cody/-/blob/vscode/src/local-context/download-symf.ts?L113-171 that I added a different message for when it's extracting vs. downloading.

Given that the image you sent shows "downloading" I'm guessing that for whatever reason https://sourcegraph.com/github.com/sourcegraph/cody/-/blob/vscode/src/local-context/utils.ts?L38-49 isn't getting resolved.

I'm looking into why we're not receiving a finish signal.

ned-pcs commented 1 month ago

OK, after clearing out the cody-engine and symf dirs and restarting VS Code, I see no trace of bfg in the cody-engine dir. However, there are partial cody-engine and symf zip files there:

536768 Jul 25 09:03 ./cody-engine/cody-engine-5.4.6040-linux-x64.tmp/cody-engine-5.4.6040-linux-x64.zip
278201 Jul 25 09:03 ./symf/symf-v0.0.12-x86_64-linux.tmp/symf-v0.0.12-x86_64-linux.zip

I don't have the download link for the cody-engine zip file; I could try unpacking that plus the bfg zip in the cody-engine dir, and the symf zip in the symf dir if that's what is needed.


After cleaning out those directories and re-starting in a much smaller repo, I see both executables installed:

/home/ned/.config/Code/User/globalStorage/sourcegraph.cody-ai:
total 8
drwxrwxr-x 2 ned ned 4096 Jul 26 09:24 cody-engine
drwxrwxr-x 3 ned ned 4096 Jul 26 09:31 symf

/home/ned/.config/Code/User/globalStorage/sourcegraph.cody-ai/cody-engine:
total 193152
-rwxr-xr-x 1 ned ned 197779648 Jul 26 09:24 cody-engine-5.4.6040-linux-x64

/home/ned/.config/Code/User/globalStorage/sourcegraph.cody-ai/symf:
total 31148
drwxrwxr-x 4 ned ned     4096 Jul 26 09:31 indexroot
-rwxr-xr-x 1 ned ned 31889320 Jul 26 09:31 symf-v0.0.12-x86_64-linux

However, looking at the logs, I wonder whether those zips are being downloaded and installed every time I start up the extension, despite the fact that they're already the latest version.

I will now start VS Code on my original repository.

ned-pcs commented 1 month ago

Thanks @ned-pcs I will let the product team know now. How large is the repo (files and size)

This is a repo containing submodules; most of the files are not of interest, but overall there are:

53290 C files
4369 Python files
2173 Javascript files
as well as a number of miscellaneous binary blobs, documents, and other things.

I don't see how to limit the scope of the search.

It appears that indexing fails for this repository.

Here's the output log for the larger repo:

telemetry-v2: recordEvent: cody.extension/savedLogin 
█ AuthProvider:init:lastEndpoint: Token recovered from secretStorage https://sourcegraph.com/
█ GraphQLTelemetryExporter: evaluated export mode: 5.2.5+
█ CodyLLMConfiguration: {"chatModel":"anthropic/claude-3-sonnet-20240229","chatModelMaxTokens":12000,"fastChatModel":"anthropic/claude-3-haiku-20240307","fastChatModelMaxTokens":12000,"completionModel":"fireworks/starcoder","completionModelMaxTokens":9000,"provider":"various","smartContextWindow":true}
█ ClientConfigSingleton: refreshing configuration
█ ClientConfigSingleton: refreshed {"codyEnabled":true,"chatEnabled":true,"autoCompleteEnabled":true,"customCommandsEnabled":true,"attributionEnabled":false,"smartContextWindowEnabled":true,"modelsAPIEnabled":false}
█ ModelsService: Setting primary models: ["anthropic/claude-3-5-sonnet-20240620","anthropic/claude-3-sonnet-20240229","anthropic/claude-3-opus-20240229","anthropic/claude-3-haiku-20240307","openai/gpt-4o","openai/gpt-4-turbo","openai/gpt-3.5-turbo","google/gemini-1.5-pro-latest","google/gemini-1.5-flash-latest","fireworks/accounts/fireworks/models/mixtral-8x7b-instruct","fireworks/accounts/fireworks/models/mixtral-8x22b-instruct"]
█ telemetry-v2: recordEvent: cody.auth/connected 
█ featureflag: refreshed
█ ContextFiltersProvider: setContextFilters 
█ SymfRunner: unsafeEnsureIndex file:///home/ned/PCS/Pyrenees/Chihuahua-Firmware
█ LocalEmbeddingsController: constructor
█ ChatsController:constructor: init
█ LocalEmbeddingsController: start
█ CodyEngineService: constructor
█ LocalEmbeddingsController: start
█ openctx: OpenCtx is enabled in Cody
█ LocalEmbeddingsController: get status
█ ChatController: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"indeterminate","embeddingsAPIProvider":"sourcegraph"}]}]
█ CodyEngine: using downloaded bfg path "/home/ned/.config/Code/User/globalStorage/sourcegraph.cody-ai/cody-engine/cody-engine-5.4.6040-linux-x64"
█ CodyEngineService: spawnAndBindService service started, initializing
█ LocalEmbeddingsController: spawnAndBindService service started, initializing
█ ChatController: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"indeterminate","embeddingsAPIProvider":"sourcegraph"},{"kind":"search","type":"local","state":"indexing"}]}]
█ CodyEngine: spawnBfg:stderr 
█ symf: using downloaded symf "/home/ned/.config/Code/User/globalStorage/sourcegraph.cody-ai/symf/symf-v0.0.12-x86_64-linux"
█ LocalEmbeddingsController: spawnAndBindService initialized
█ CodyCompletionProvider:initialized: fireworks/starcoder-hybrid
█ telemetry-v2: recordEvent: cody.context.embeddings/loaded 
█ LocalEmbeddingsController: get status
█ ChatController: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"unconsented","embeddingsAPIProvider":"sourcegraph"},{"kind":"search","type":"local","state":"indexing"}]}]
█ LocalEmbeddingsController: index starting repository file:///home/ned/PCS/Pyrenees/Chihuahua-Firmware
█ telemetry-v2: recordEvent: cody.context.embeddings/loaded 
█ LocalEmbeddingsController: get status
█ ChatController: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"unconsented","embeddingsAPIProvider":"sourcegraph"},{"kind":"search","type":"local","state":"indexing"}]}]
█ LocalEmbeddingsController: index starting repository file:///home/ned/PCS/Pyrenees/Chihuahua-Firmware
█ LocalEmbeddingsController: index-health c2e61bd83c05478fb56bc3639d5ad062 {"code":-32099}
█ LocalEmbeddingsController: get status
█ ChatController: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"indexing","embeddingsAPIProvider":"sourcegraph"},{"kind":"search","type":"local","state":"indexing"}]}]
█ UpstreamHealth: Ping took 4065ms (Gateway: 4026ms) 
█ LocalEmbeddingsController: index-health 568a40cfb1734c308fcfe82e6763cbc1 {"code":-32099}
█ LocalEmbeddingsController: get status
█ ChatController: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"indexing","embeddingsAPIProvider":"sourcegraph"},{"kind":"search","type":"local","state":"indexing"}]}]
█ symf: creating index file:///home/ned/.config/Code/User/globalStorage/sourcegraph.cody-ai/symf/indexroot/home/ned/PCS/Pyrenees/Chihuahua-Firmware
█ LocalEmbeddingsController: get status
█ ChatController: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"unconsented","embeddingsAPIProvider":"sourcegraph"},{"kind":"search","type":"local","state":"indexing"}]}]
█ telemetry-v2: recordEvent: cody.context.embeddings/loaded 
█ LocalEmbeddingsController: get status
█ ChatController: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"ready","embeddingsAPIProvider":"sourcegraph"},{"kind":"search","type":"local","state":"indexing"}]}]
█ LocalEmbeddingsController: load after indexing "done" file:///home/ned/PCS/Pyrenees/Chihuahua-Firmware true
█ LocalEmbeddingsController: index-health {"commit":"7c4c5a5b1905179ae817cb28f4d939af8cfe06a7","dimension":768,"format":"LocalEmbeddings","model":"sourcegraph/st-multi-qa-mpnet-base-dot-v1","numFiles":221,"numItems":1197,"numItemsDeleted":0,"numItemsFailed":0,"numItemsNeedEmbedding":0,"repoName":"github.com/Product-Creation-Studio/Pyrenees-NFF-Firmware","type":"found"}
█ LocalEmbeddingsController: get status
█ ChatController: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"ready","embeddingsAPIProvider":"sourcegraph"},{"kind":"search","type":"local","state":"indexing"}]}]
█ UpstreamHealth: Ping took 159ms (Gateway: 150ms) 
█ symf: symf index creation failed EvalError: symf index creation failed: Error: symf exited with code null
█ ChatController: postContextStatus [{"displayName":"Chihuahua-Firmware","providers":[{"kind":"embeddings","state":"ready","embeddingsAPIProvider":"sourcegraph"},{"kind":"search","type":"local","state":"failed"}]}]

When I look at the contents under symf/indexroot I see that it only made its way into one directory within a submodule. This submodule's origin is in a public Github repository.

If it matters, the main repository's origin is a private repository on Github.

RXminuS commented 1 month ago

Ok I think the indexing is separate from the downloading / installation issue. I'll create a separate internal issue for it so we can get the right people looking at that one.

For the downloading; I think the issue might be that for some reason the previous request might have gotten interrupted and there were partial files left that prevented writing. This could maybe happen if VSCode restarted right as it was downloading. Alternatively...did you at any point start VSCode as sudo? I'll need to dig a little deeper to form a bit more solid theory.

The cody-engine is this file https://github.com/sourcegraph/symf/releases/download/v0.0.12/symf-x86_64-linux-musl.zip , it's just named differently. And although it will check for the latest versions every time you open Cody it won't re-download it if it's already available.