microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
161.95k stars 28.47k forks source link

Investigate high extension install error rate #190233

Open isidorn opened 1 year ago

isidorn commented 1 year ago

@dtivel discovered that we are having a high error rate of extension install.

Here are some kusto queries with interesting results:

RawEventsVSCode
| where ServerUploadTimestamp < ago(1d)
| where EventName == 'monacoworkbench/extensiongallery:install'
| summarize total_count = count(), failure_count = countif(Measures["success"] == 0.0) by bin(ServerUploadTimestamp, 1d)
| extend failure_percentage = todouble(failure_count) / total_count * 100
| project ServerUploadTimestamp, failure_percentage
| render timechart

Screenshot 2023-08-11 at 10 47 34

And here is the ErrorCodeDetail

RawEventsVSCode
| where ServerUploadTimestamp > ago(60d)
| where EventName == 'monacoworkbench/extensiongallery:install'
| where Measures["success"] == 0.0
| summarize count() by tostring(Properties["errorcode"])
| order by count_ desc

Properties_errorcode count_
Internal 10,542,542
Download 5,688,506
CorruptZip 3,909,848
Incompatible 2,331,489
Extract 780,667
IncompatiblePreRelease 453,860
Signature 227,637
incompatible 120,565
EPERM 88,294
EACCES 74,313
unknown 67,296
Invalid 51,401
Delete 28,992
downloading 27,435
ReleaseVersionNotFound 20,460
  8,735

gallery

@sandy081 why do we have such a high failure rate? What is the Internal error code? It looks like there is a high error rate on work days when there is a higher load on MP service so it also might be influenced by the service.

I can not go back more in time than 10 months to check if we always had these error rates. Error rates on extension update are considerably lower.

fyi @joaomoreno

sandy081 commented 1 year ago

Internal errors are uncaught errors and hence cannot be classified. They stood the most (50%). I will try to see if there are places in the code where we can catch more error and classify them. The remaining are classified and can be explained by the error code name. Download errors could be network or service related and corrupt zips are service related.

isidorn commented 11 months ago

Thanks for looking into this. Being able to further classify the internal errors would greatly help here.

juancampa commented 9 months ago

I just hit this. It doesn't fix itself even after a restart.

A corrupted file is stored in CachedExtensionVSIXs and never re-downloaded (I guess there's no checksum being checked anywhere). The cached vsix, instead of being a zip, it's an HTML with the following content:

Screenshot 2023-11-29 at 6 36 45 PM

Located in /Users/<username>/Library/Application Support/Code/CachedExtensionVSIXs/membrane.membrane-1.6.1-darwin-arm64.

An easy bandaid fix might be to delete the file once the CorruptZip error is thrown so that it gets redownloaded.

Update

Deleting the corrupted file indeed fixes the issue.

sandy081 commented 9 months ago

An easy bandaid fix might be to delete the file once the CorruptZip error is thrown so that it gets redownloaded

It should be doing this already. Can you please confirm if this is not happening? Please share the log - F1 > Open View... > Shared

juancampa commented 9 months ago

@sandy081 Perhaps is broken in MacOS?

Here are the logs. Notice the timestamp difference after the CorruptZip error, that was the install after I manually deleted the corrupt file.

2023-11-29 17:44:16.548 [info] Deleted uninstalled extension from disk github.copilot /Users/juan/.vscode/extensions/github.copilot-1.138.0
2023-11-29 17:44:16.549 [info] Deleted uninstalled extension from disk rust-lang.rust-analyzer /Users/juan/.vscode/extensions/rust-lang.rust-analyzer-0.3.1740-darwin-arm64
2023-11-29 17:44:16.551 [info] Deleted uninstalled extension from disk timonwong.shellcheck /Users/juan/.vscode/extensions/timonwong.shellcheck-0.34.0-darwin-arm64
2023-11-29 17:44:16.552 [info] Deleted uninstalled extension from disk ms-vscode-remote.remote-ssh /Users/juan/.vscode/extensions/ms-vscode-remote.remote-ssh-0.107.0
2023-11-29 17:44:16.552 [info] Deleted uninstalled extension from disk llvm-vs-code-extensions.vscode-clangd /Users/juan/.vscode/extensions/llvm-vs-code-extensions.vscode-clangd-0.1.24
2023-11-29 17:44:29.300 [info] Getting Manifest... membrane.membrane
2023-11-29 17:44:29.303 [info] Installing extension: membrane.membrane
2023-11-29 17:44:29.900 [error] Error while installing the extension: membrane.membrane
2023-11-29 17:44:29.903 [error] Failed to install extension. membrane.membrane
2023-11-29 17:44:29.903 [error] CorruptZip: end of central directory record signature not found
    at d.download (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/code/node/sharedProcess/sharedProcessMain.js:109:141)
    at async ee.x (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/code/node/sharedProcess/sharedProcessMain.js:109:19904)
    at async ee.h (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/code/node/sharedProcess/sharedProcessMain.js:109:17947)
2023-11-29 17:44:29.903 [error] [uncaught exception in sharedProcess]: end of central directory record signature not found: CorruptZip: end of central directory record signature not found
    at d.download (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/code/node/sharedProcess/sharedProcessMain.js:109:141)
    at async ee.x (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/code/node/sharedProcess/sharedProcessMain.js:109:19904)
    at async ee.h (/Applications/Visual Studio Code.app/Contents/Resources/app/out/vs/code/node/sharedProcess/sharedProcessMain.js:109:17947)
2023-11-29 18:40:24.887 [info] Getting Manifest... membrane.membrane
2023-11-29 18:40:24.893 [info] Installing extension: membrane.membrane
2023-11-29 18:40:27.007 [info] Extension signature is verified: membrane.membrane
2023-11-29 18:40:27.172 [info] Extracted extension to file:///Users/juan/.vscode/extensions/membrane.membrane-1.6.1-darwin-arm64: membrane.membrane
2023-11-29 18:40:27.182 [info] Renamed to /Users/juan/.vscode/extensions/membrane.membrane-1.6.1-darwin-arm64
2023-11-29 18:40:27.183 [info] Extracting extension completed. membrane.membrane
2023-11-29 18:40:27.193 [info] Extension installed successfully: membrane.membrane

About Visual Studio Code:

Version: 1.84.2
Commit: 1a5daa3a0231a0fbba4f14db7ec463cf99d7768e
Date: 2023-11-09T10:52:57.054Z
Electron: 25.9.2
ElectronBuildId: 24603566
Chromium: 114.0.5735.289
Node.js: 18.15.0
V8: 11.4.183.29-electron.0
OS: Darwin arm64 23.1.0
sandy081 commented 9 months ago

@juancampa You are right, there is a bug in cleaning up the downloaded VSIX where it is a corrupt zip. Can you pleas file a separate issue?

plattenschieber commented 9 months ago

Another thing we found out today is that sometimes the vsix files seem to be packed into another zip container. When downloading the extensions via curl <extension-url> -o <extension-filename> and trying to install via code --install-extension we get the same error as in the Windows Output of VSCode:

2023-11-30 14:24:58.281 [error] end of central directory record signature not found: CorruptZip: end of central directory record signature not found

Whereas when downloading the extension via curl <extension-url> --compressed -o <extension-filename> results directly in the correct vsix file, without another zip layer around it.

A colleague and I tried debugging the fetch call in the marketplace library and coming up with a minimal example to ensure zipped vsix files can be handled correctly, but are not there yet.

isidorn commented 8 months ago

@sandy081 could we investigate a bit more into this during our Debt / Cleanup milestones.

I would like us to get more precise reporting of extension install error rates since it is important to have precise numbers when we enable repository and publisher signing.

fyi @dtivel

sandy081 commented 7 months ago

Added specific error codes to understand internal errors. Also fixed a bug that is reporting error twice.

starball5 commented 7 months ago

Note sure how many of these are related, but I've seen a noticeable number of extension installation failure questions on SO recently:

sandy081 commented 2 months ago

Here are the error code results from 1.90.0.

Outcome: Able to resolve the reason for internal errors. Seems majority errors are happening while updating metadata. Added additional telemetry data to further understand about metadata update errors.

Image

sandy081 commented 1 week ago

Capturing the latest charts.

Image

Error percentages are coming down gradually. It is averaging around 10% in last 3 weeks.

Image

I am able to resolve/fix internal & update metadata errors almost completely. The next bigger percentage of errors falls under Extract category.