sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.29k forks source link

Diagnose and fix slow /.api/registry/extensions endpoint #7544

Closed tsenart closed 4 years ago

tsenart commented 4 years ago

Site24x7 is frequently alerting us due to sporadic timeouts on this endpoint. To prevent on-call burnout, we should either fix this endpoint or remove this alert.

An independent Vegeta probe that ran for around a day in our tooling cluster confirms that the issue is not with Site24x7.

Requests      [total, rate]            78892, 1.00
Duration      [total, attack, wait]    21h54m56.904110981s, 21h54m55.999913712s, 904.197269ms
Latencies     [mean, 50, 95, 99, max]  2.276716911s, 633.933313ms, 9.46170857s, 32.209758157s, 1m0.000479738s
Bytes In      [total, mean]            0, 0.00
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  99.73%
Status Codes  [code:count]             0:211  200:78680  502:1  
Error Set:
Get https://sourcegraph.com/.api/registry/extensions: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
net/http: request canceled (Client.Timeout exceeded while reading body)
502 Bad Gateway
read tcp 10.28.2.29:38709->104.26.9.187:443: read: connection reset by peer
Get https://sourcegraph.com/.api/registry/extensions: read tcp 10.28.2.29:46311->104.26.8.187:443: read: connection reset by peer
tsenart commented 4 years ago

Looking at the code, it seems that the only possible slowness source is Postgres. Either the initial list query or the fact that we're doing N+1 queries after that (or both).

https://github.com/sourcegraph/sourcegraph/blob/6eb6f01bb7e739db8e54e0eb26cfff220bdce03f/enterprise/cmd/frontend/internal/registry/http_api.go#L144

uwedeportivo commented 4 years ago

Dear all,

This is your release captain speaking. 🚂🚂🚂

Branch cut for the 3.13 release is scheduled for tomorrow.

Is this issue / PR going to make it in time? Please change the milestone accordingly. When in doubt, reach out!

Thank you

ryanslade commented 4 years ago

Another issue is that this endpoint currently returns around 1.3MB of data, most of that is base64 encoded icons.

tsenart commented 4 years ago

This woke me up 3h before my alarm clock today :(

keegancsmith commented 4 years ago

Fixed by https://github.com/sourcegraph/sourcegraph/pull/8583