slow to fetch term via API (only) in large codebase

atacratic commented 3 years ago

I click to fetch a term via the API (so via the codebase-ui in fact) and my browser takes ~13 seconds to show me the term, with ucm using max CPU during that time.

Fetching the same term (by name) using view from ucm itself takes only ~0.5 seconds.

Just terms, not types.

Presumably a function of my codebase, which on disk is using 2.9GB, almost all in the form of 170k files under v1/paths. FWIW when I do ls in ucm at the root there are 12k definitions.

versions: unison:trunk@2b83c9 codebaseui:main@e627cb

pchiusano commented 3 years ago

@atacratic thanks for the report. Just to clarify, you are saying the slowness only happens when fetching a term, but fetching a type is still relatively speedy? If so that's probably a good clue.

Also curious if the slowness is just some of the terms, or all of them.

atacratic commented 3 years ago

Correct!

And seems to be all of the terms.

aryairani commented 3 years ago

Not sure if I'm seeing the exact same problem as @atacratic, but I seem to be seeing the same problem as @atacratic: a short lag when loading types, and a long lag on loading terms; all of them afaict. https://user-images.githubusercontent.com/538571/109903334-a82f1400-7c69-11eb-8180-4b44168b65b9.mov

aryairani commented 3 years ago

However, I'm seeing 2-3% CPU usage, not 100%, with my lag. Also >9GB memory usage. :)

runarorama commented 3 years ago

I did some profiling of this, and I am seeing that the server version of this is much slower than the UCM version. However, all of the difference is taking place inside Servant. I suspect that turning the definitions into JSON is what's taking the bulk of the time.

pchiusano commented 3 years ago

@runarorama Interesting... some random ideas I thought of that might help -

I would think that JSON encoding speed is insanely fast assuming you know what you're writing, as it's just concatenating some strings (or at worst, producing a JSON AST and then folding that and concatenating strings). The only thing I'd maybe wonder about is if it's trying to produce pretty JSON as opposed to everything crammed on one line. Is it possible that laziness is obscuring where the work is really happening?
I think the fact that it's happening for all terms (even small ones apparently?) but no types is a really strong clue. Do terms and types really have radically JSON sizes? Also, notably - in both cases, it's an AnnotatedText that's being JSON-ified. By that point, we don't even know or care whether it's a term or a type.
I'm suspicious about accidental historical name search - you can add a trace statement to FileCodebase.branchFromFiles. When the request is issued, this function should not get called, since all the names should be present in the current namespace. And if it's called over and over, that usually indicates a historical name search is happening. You can turn off namespace caching to make this more clear. The fact that Arya isn't seeing much CPU could be indicative it's just doing lots of I/O to read branches from disk.

That's all I got.

runarorama commented 3 years ago

Yeah, the server always calls branchFromFiles since it doesn't know up front which branch hash is going to get requested. More specifically, it's called by getBranchForHash, which we need to call.

pchiusano commented 3 years ago

Right, that makes sense. It's fine if that function gets called once for each subnamespace for the branch in question (each level of the tree involves a separate load from disk), but if after you have your Branch m, if it gets called again after that, that implies it's looking through history.

runarorama commented 3 years ago

no, it's just called once. It's called never for the UCM version though, since that holds on to the root and subscribes to root changes.

unisonweb / unison

slow to fetch term via API (only) in large codebase #1828