jonashaag / klaus

docker run klaus / pip install klaus — the first Git web viewer that Just Works™.
http://klausdemo.lophus.org
Other
686 stars 101 forks source link

504 Gateway Time-out with Smart HTTP on big repos #230

Open royneary opened 5 years ago

royneary commented 5 years ago

When I try to clone a big git repository over HTTP (like a mirror of https://github.com/NixOS/nixpkgs.git, 1.2 GB when cloned), I get a timeout. Git prints the following error message:

error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504 Gateway Time-out
fatal: The remote end hung up unexpectedly

I'm using nginx and uwsgi. I tried timeouts up to 600 seconds (using harakiri = 600; in the uwsgi config and uwsgi_read_timeout = 600s; uwsgi_write_timeout = 600s; in the nginx config. After that timeout I can see the following in the uwsgi logs:

Nov 12 07:02:34 luno uwsgi[16578]: Mon Nov 12 07:02:34 2018 - *** HARAKIRI ON WORKER 1 (pid: 16819, try: 1) ***
Nov 12 07:02:34 luno uwsgi[16578]: Mon Nov 12 07:02:34 2018 - HARAKIRI !!! worker 1 status !!!
Nov 12 07:02:34 luno uwsgi[16578]: Mon Nov 12 07:02:34 2018 - HARAKIRI [core 0] 2a02:2450:102a:c75:7410:cb6:be9d:1f74 - POST /code/nixpkgs/git-upload-pack since 1542002493
Nov 12 07:02:34 luno uwsgi[16578]: Mon Nov 12 07:02:34 2018 - HARAKIRI !!! end of worker 1 status !!!
Nov 12 07:02:35 luno uwsgi[16578]: DAMN ! worker 1 (pid: 16819) died, killed by signal 9 :( trying respawn ...
Nov 12 07:02:35 luno uwsgi[16578]: Respawned uWSGI worker 1 (new pid: 17209)

My server has an old x86 atom processor, but it still should be able to process a 1.2 GB repo within 10 minutes. Not sure if this is a bug in klaus or in dulwich or if I can solve it by tuning my config.

jonashaag commented 5 years ago

Can you have a look at dmesg or similar to check if it’s an OOM situation? Or otherwise try to debug why the process is killed.

royneary commented 5 years ago

dmesg doesn't say anything and it's not OOM (I checked the memory consumption with htop). To me it looks like uwsgi kills its worker process because klaus didn't respond within the timeout period. So we would have to see why klaus is taking so long. Is there debug output that I can enable?

jonashaag commented 5 years ago

Not really except for the web server logs which won't be very helpful I guess.

@jelmer any idea if this could be a Dulwich performance issue or how to debug it?

royneary commented 5 years ago

nginx has this in its logs

2018/11/12 09:31:24 [error] 16565#16565: *3177 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 2a02:2450:102a:c75:1560:e24f:71fa:d25e, server: mydomain.com, request: "POST /code/nixpkgs/git-upload-pack HTTP/1.1", upstream: "uwsgi://unix:///run/uwsgi/klaus.sock", host: "mydomain.com"
jelmer commented 5 years ago

It's possible this is related to Dulwich' behaviour. It would be interesting to know if you can reproduce this with a local Dulwich server (i.e. running "dulwich web"), and how long that takes.

What version of Dulwich and Klaus are you using?

On November 12, 2018 8:51:27 AM UTC, Christian Ulrich notifications@github.com wrote:

nginx has this in its logs

2018/11/12 09:31:24 [error] 16565#16565: *3177 upstream timed out (110:
Connection timed out) while reading response header from upstream,
client: 2a02:2450:102a:c75:1560:e24f:71fa:d25e, server: mydomain.com,
request: "POST /code/nixpkgs/git-upload-pack HTTP/1.1", upstream:
"uwsgi://unix:///run/uwsgi/klaus.sock", host: "mydomain.com"

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/jonashaag/klaus/issues/230#issuecomment-437801943

jelmer commented 5 years ago

On Mon, Nov 12, 2018 at 12:51:27AM -0800, Christian Ulrich wrote:

nginx has this in its logs

2018/11/12 09:31:24 [error] 16565#16565: *3177 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 2a02:2450:102a:c75:1560:e24f:71fa:d25e, server: mydomain.com, request: "POST /code/nixpkgs/git-upload-pack HTTP/1.1", upstream: "uwsgi://unix:///run/uwsgi/klaus.sock", host: "mydomain.com"

Yeah, this appears to be a Dulwich performance issue. We haven't agressively optimized the pack fetching code, so there are probably some easy gains to be had there.

If you file a bug in the dulwich repo, I'll dig into it.