On slow~ish boxes, toobusy-js kicks in too early

hackmdio / codimd

CodiMD - Realtime collaborative markdown notes on all platforms.

https://hackmd.io/c/codimd-documentation

GNU Affero General Public License v3.0

9.27k stars 1.06k forks source link

On slow~ish boxes, toobusy-js kicks in too early #1077

Closed Nebukadneza closed 5 years ago

Nebukadneza commented 5 years ago

Hey there,

I’m running CodiMD happily on a slightly aged server … happily? Well, almost! Sometimes, especially when some other services decide they need to trash the pool little DB hard, CodiMD reacts somewhat sluggish. In fact so sluggish that the default maxLag of 70ms (?) of toobusy-js often kick in. This is slightly unnerving to users, especially on mobile where some browsers inhibit force-reload for some time after a failed load.

It would be great if maxLag would be configurable in some way for users like me. Of course, I guess we`ll die out eventually, replacing our old, steam-coughing servers with shiny new serverless-hybrid-hyperconverged-cloudnative-supercomputing instance-containers, but … until then, it’d be great if we could help ourselves via configuration instead of code-changes ^_^.

Best, and thanks for the great project! -Dario

SISheogorath commented 5 years ago

Can we first look into possible performance lags in CodiMD itself? Can you tell some more what situation cause the lagging? Like many people editing a document, many documents are open at the same time, …

Besides that, just out of interest: Do you use useCDN = true?

I would like it to have some insight here to make CodiMD faster in general which may obsoletes such a config.

Nebukadneza commented 5 years ago

Hej there,

As always, thanks a lot for your quick reaction!

I have useCDN = false, but that is only relevant for the client loading static resources, right? Of course these requests for them come in a short period of time, but they should be rather easy to fulfil, compared to actually functional stuff, right?

The situation is as simple as it is dire: a single user (for the first time) loading a single document or the frontpage, with the rest of the codimd instance idle. Honestly, CodiMD is not at fault here, the whole box just has a way too high load, especially the disk. However, since the fix is so simple (increase that one variable, maxLag of toobusy-js), I thought I’d make an issue for this nevertheless.

However, if there’s anything i could do to gain some more insight for you, please advise me on some details, and I’ll gladly check ^_^.

Best & Thanks -Dario

SISheogorath commented 5 years ago

The reason I asked for useCDN = true has an actual performance reason especially when you have a disk latency problem: nodejs is single threaded. Which means it takes the time to read and write your data. The current static files used for useCDN = false are around 7MB. And they are read and send out every time someone connects. When you use useCDN = true it's less than 3MB without any decrease in security since we use SRI-hashes.

So yes, it can keep your node process busy and block it from doing more useful stuff. An alternative is to serve the ./public/ sub directories using your reverse proxy (if you use nginx or apache, so it's able to do that).

I will may provide you a patch later, that you can apply and provide some performance information. But have to work that one out first. This should help us to see where high load has an impact on the system.

dsprenkels commented 5 years ago

In my case, I get toobusy-js 503 Service Unavailable errors every time after a restart. This makes sense, because at this point, most of the code is not yet read by disk by V8. This batch of reads from disk generally takes a lot more than 70 ms.

Also, I'm using useCDN = false (reasons are user privacy and availability).

SISheogorath commented 5 years ago

If one of you wants to give some more insight it would be awesome when you would start the node process with the flag: --prof this will generate some files within the working directory and their output can be processed using node --prof-process isolate-0xXXXXXXXXXXXX-v8.log

It provides various statics about how much time node spend in which part of the code:

example performance stats

Some more details: https://nodejs.org/en/docs/guides/simple-profiling/

dsprenkels commented 5 years ago

This profile is recorder over about only 3 page loads.

 [Summary]:
   ticks  total  nonlib   name
    211   15.0%   16.1%  JavaScript
   1026   73.0%   78.2%  C++
     91    6.5%    6.9%  GC
     94    6.7%          Shared libraries
     75    5.3%          Unaccounted

prof.txt

SISheogorath commented 5 years ago

Looks like our Codebase is quite optimal. It only appears 5 times and all are done within one CPU tick. So our codebase is not the problem. Yay, good news.

So it seems like all we really need to do is adding an option. I'll purpose a PR. Keep in mind this will go into 1.4.0

Nebukadneza commented 5 years ago

Hi,

sorry, it seems I’m a little late now. One thing: indeed serving the static assets with apache helped a lot — now the busy-notifications became much less frequent, but still exist.

As for the profiling, I let it run for a whole day, with the result that nodejs segfaults(!) parsing it. A shorter run of a few users with a few pageloads look very similar to @dsprenkels results, so I’ll save you the hassle ….

Indeed, I’ll be gladly waiting for a PR. Thanks for taking this corner-case issue so serious, @SISheogorath !

Best and thanks a lot! -Dario

dsprenkels commented 5 years ago

One observation: Notice that ~25% of the ticks are in node::(anonymous namespace)::ContextifyScript::New(v8::FunctionCallbackInfo<v8::Value> const&), which I think is basically just JIT compilation.

jackycute commented 5 years ago

the maxLag of toobusy is configurable in favor of PR https://github.com/hackmdio/codimd/pull/1239

Thanks everyone!