ocaml / infrastructure

WIki to hold the information about the machine resources available to OCaml.org
40 stars 9 forks source link

watch.ocaml.org is down #89

Closed kit-ty-kate closed 2 weeks ago

kit-ty-kate commented 5 months ago

Could it be related to github being down?

Screenshot 2024-01-09 at 13 46 10
mtelvers commented 5 months ago

The service has been restored

mtelvers commented 5 months ago

A massive log file (44GB!) in the Docker overlay2 folder caused the server to run out of space.

avsm commented 5 months ago

That's a new one -- what was the logfile?

mtelvers commented 5 months ago

/var/log/nginx/peertube.error.log within the overlay2 folder. It looked like, over a period of a week, there were continuous attempts to download a file which didn't exist. However, the log wasn't increasing when I looked at it. However, on revisiting it now to check the exact file name, I see that the entries are back. The source IP is 147.28.133.131, which is amok.recoil.org. Perhaps there is a federation between these two? At the moment, there are about 50 lines per second like this:

2024/01/09 18:31:02 [error] 14#14: *13191 open() "/var/www/peertube/storage/web-videos/e463f10c-d052-42db-b972-d19bbcaa205d-360.mp4" failed (2: No such file or directory), client: 147.28.133.131, server: watch.ocaml.org, request: "GET /static/webseed/e463f10c-d052-42db-b972-d19bbcaa205d-360.mp4 HTTP/1.1", host: "watch.ocaml.org"

I've set iptables to drop these connections until we can get to the bottom of it. [Remove with iptables -D DOCKER-INGRESS -s 147.28.133.131 -j DROP.]

avsm commented 2 months ago

@mtelvers I think part of the root cause for this might be an missing part of the upgrade. The "assets" volume needs to be deleted on upgrade so that it can be recreated with the latest frontend files, which then form part of the federation API.

See e.g. https://github.com/Chocobozzz/PeerTube/issues/5030 -- this also goes wrong when trying to update the admin config on watch.ocaml.org at the moment (see screenshot)

image
mtelvers commented 3 weeks ago

@avsm I have deleted the peertube-assets volume and restarted the service. The content has been repopulated. I don't have access to the admin dashboard so I can't verify whether this has fixed your issue. However, I've removed the iptables rule and the log file is quiet.

shonfeder commented 2 weeks ago

Sounds like this is resolved, and we can reopen if further problems arise.