noobient / killinuxfloor

noobient's Killing Floor 2 Linux Server Installer and Manager
MIT License
17 stars 4 forks source link

The server gets berserk on CPU and RAM out of nowhere #70

Open triuk opened 1 year ago

triuk commented 1 year ago

Hi, I do not know what changed, but even after a clean install, the CPU hits 100 % and RAM is consumed +30 MB/s. The game runs, but unplayable in these conditions and the server is not responsive after a while anyway. Do you experience similar issue? I tried to revert back to b2e4e04b2763604b4e3cedd5241cd123f3a84fe3 with

git reset --hard b2e4e04b2763604b4e3cedd5241cd123f3a84fe3
./install.sh

but it is the same (can I revert previously installed version this way?). Before I start doing some tests, I'd like to make sure everything is fine on your side.

bviktor commented 1 year ago

Actually, I noticed the same thing, "connection lost" all the time. But I can't imagine what would've possibly broke things on my side.

Apparently the last update to KF2 was on Feb 2nd:

https://steamdb.info/app/232090/patchnotes/

So it shouldn't be that either I guess?

I tried skipping the UKFP mutator, but that didn't solve it for me. Will keep you updated if I find out the cause, please do the same.

triuk commented 1 year ago

Well the error diasppeared after a longer time. It took longer than expected. It is downloaded. False alarm. Maybe it is related, maybe not. I'll just post everything suspicious I find. I did a clean install in VBox Ubuntu 22.04 server (faster to do things on my workstation than the potato low-power server). Here is the output of klf status workshop, nothing is downloaded.

------------------------------------------------------
Subscribed workshop maps:                             
------------------------------------------------------
838775511       KF-HorzineArenaRMEdition        ❌❌🌐
1210703659      KF-KillingPool                  ❌❌🌐
------------------------------------------------------
Subscribed workshop mutators:                         
------------------------------------------------------
2625647922      N/A                             ❌❌🌐
2875147606      N/A                             ❌❌🌐
find: β€˜/home/steam/Steam/KF2Server/Binaries/Win64/steamapps/workshop/content/232090’: No such file or directory
find: β€˜/home/steam/Steam/KF2Server/KFGame/Cache’: No such file or directory
triuk commented 1 year ago

OK, I found what causes the behavior. The server hits 100 % CPU and eat RAM (KFGameSteamServ process) as soon as I expose the UDP gaming port 7777 to the internet. It looks like someone is mining crypto on the KF2 server (so far a joke, but not really). The game is totally fine when I have the server just in my local network. Unfortunately I have no idea, how to fix this or if you can even fix this. But the KF2 server is unusable in this state.

bviktor commented 1 year ago

Huh, so maybe that's why I saw lot of discussions about DDoS protection on the TWI forums...

So maybe we're being flooded with bullcrap? It'd be explained by the fact that my new server was usable the other day.

Whatever the case, implementing rate limits on the exposed port would be a good idea, so I'll see what I can do about it.

bviktor commented 1 year ago

And also thanks a lot for your reports!

bviktor commented 1 year ago

https://www.cyberciti.biz/faq/enable-firewalld-logging-for-denied-packets-on-linux/ https://serverfault.com/a/683733/158208 https://firewalld.org/documentation/man-pages/firewalld.policy.html

triuk commented 1 year ago

Yes, it can be some kind of DDoS. Firewall solution (if even possible) is just a workaround. Thank you for the info, I'll try it later, I am going to bed now :) Nevertheless,the TWI must do the patch work, because it is their application that is exploited and eats absurd amount of CPU and RAM.

bviktor commented 1 year ago

Unfortunately in this day and age DDoS protection is not optional :)

I already have something in my mind - rate limit for connections on KF2 ports, then log with firewalld the ones that got rejected, and then fail2ban those IPs for a day or so.

triuk commented 1 year ago

There is a solution. Updated today. I still think TWI should resolve it on KF2 server side, but probably plain hope since the problem started in 2021. https://forums.tripwireinteractive.com/index.php?threads/kf2-or-any-unreal-engine-3-server-on-redhat-centos-rocky-alma-linux-ddos-defense-with-the-help-of-firewalld.2337631/

triuk commented 1 year ago

OK, so I installed needed packages: sudo apt install cron firewalld Then run every command as a root (sudo): crontab -e put there (the path is to Launch.log file)

*/20 * * * * tail -5000 /home/steam/Steam/KF2Server/KFGame/Logs/Launch.log|grep -F -A2 'Connection timed out after'|awk -F" |:" '/Close/ {a[$7]++} END {for (b in a) {if (a[b]>4) {print b}}}'|uniq|while read ip; do firewall-cmd --permanent --ipset=networkblock --add-entry=$ip/20;done && firewall-cmd --reload >/dev/null 2>&1
0 6 */3 * * firewall-cmd --ipset=networkblock --get-entries|while read ip; do firewall-cmd --permanent --ipset=networkblock --remove-entry=$ip;done;firewall-cmd --permanent --ipset=networkblock --add-entry=128.116.0.0/17 >/dev/null 2>&1

after that run these commands (I use default port 7777):

firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -p udp --dport 7777 -m connlimit --connlimit-above 5 --connlimit-mask 20 -j DROP
firewall-cmd --permanent --new-ipset=networkblock --type=hash:net
firewall-cmd --permanent --zone=drop --add-source=ipset:networkblock

and finally allow desired ports, for me they are:

firewall-cmd --add-port=7777/udp --permanent
firewall-cmd --add-port=27015/udp --permanent
firewall-cmd --add-port=8080/tcp --permanent

Restart the firewall and done: systemctl restart firewalld

tl;dr, it works. There is still overhead, but the server is usable and my friends can connect. According to the author, the banlist is persistant, so maybe there will be less overhead in the future, when the banlist is more complete.

bviktor commented 1 year ago

Permanently banning IPs is not a good idea in general, since public IPs often change hands.

I'm trying to implement some kind of rate limiting. Will get back to you soon.

bviktor commented 1 year ago

This is an initial stab at it, for now it seems to be working but will find out in the coming days.

As for you, you already made several manual changes, so I'm afraid there's no easy way to test this, since your changes will probably interfere.

bviktor commented 1 year ago

Things kinda settled down I think, so if you get the chance to test it out from scratch sometime, please report back :)

bviktor commented 1 year ago

Well the error diasppeared after a longer time. It took longer than expected. It is downloaded. False alarm. Maybe it is related, maybe not. I'll just post everything suspicious I find. I did a clean install in VBox Ubuntu 22.04 server (faster to do things on my workstation than the potato low-power server). Here is the output of klf status workshop, nothing is downloaded.

------------------------------------------------------
Subscribed workshop maps:                             
------------------------------------------------------
838775511       KF-HorzineArenaRMEdition        ❌❌🌐
1210703659      KF-KillingPool                  ❌❌🌐
------------------------------------------------------
Subscribed workshop mutators:                         
------------------------------------------------------
2625647922      N/A                             ❌❌🌐
2875147606      N/A                             ❌❌🌐
find: β€˜/home/steam/Steam/KF2Server/Binaries/Win64/steamapps/workshop/content/232090’: No such file or directory
find: β€˜/home/steam/Steam/KF2Server/KFGame/Cache’: No such file or directory

For the record, these are valid issues as well, please see #72 and #75. But they're unrelated to the CPU/RAM problem.

triuk commented 1 year ago

Hi, I tried your workaround long term and I hate to write that, but your solution does not work for me well. Comparison of the latest 52e4884 and pre-ddos 00c3c11 with IP ban solution from the forum:

  1. Just a thought - those 4 default items at workshop take ages to download, but somehow they download after a long time - that affects both versions, probably ddos unrelated?

  2. The main difference is in the webadmin responsiveness. It takes seconds to load the page, sometimes it even does not load and I need to reload it. The chat console at the bottom permanently shows "page not found" error. I do not face this at the IP ban solution at all.

For the record, I do those tests on my workstation with Ryzen 5 5600H, so there is not a lack of resources.

bviktor commented 1 year ago

Thanks for your response!

Uh, yeah, maybe I was a bit foolish to take an nginx reverse proxy for granted.

If webadmin is slow, then you're probably hitting the rate limits over HTTP.

Would you be so kind as to reinstall and retry with the rate limit increased to dunno, maybe 50/m? Here:

https://github.com/noobient/killinuxfloor/blob/master/roles/install/tasks/firewalld.yml#L45

triuk commented 1 year ago

Increasing to 50/m did not help much. The delay is unbearable, the chat console sometimes come to life though. But I tried to remove the 8080/tcp from your script as the tcp is not vulnerable to that type of attack; and I just added the port to firewall firewall-cmd --add-port=8080/tcp --permanent This way it probably works like you intended with responsive web interface.

k0dat commented 1 year ago

@bviktor,

I probably should have posted earlier in this thread. But anyway, for the last few months I've had my rate limit set at 20/m and seems like a more sensible value than default 10/m. I found with 10/m I was personally hitting the rate limit. I think this was in KF2 client I was adjusting search parameters in the server browser probably hitting refresh a few times and my server didn't show up until I left it for a minute. I've had a friend report a similar issue even with 20/m - I think he was similarly adjusting parameters and spamming refresh. Which makes me think, should the limit be even higher than my 20/m?

Apart from a single person coming from a single IP, my thinking is there could be some people trying to join our servers at LAN parties on a shared IP address and would hit the limit trying to search for the same server at the same time. Could be also some people on shared IP via CGNAT - but I'd think given small KF2 population that would less likely than a LAN party environment, but still possible. So perhaps some sort of temporary IP ban might be worth considering when hitting a higher limit? Below are DDOS stats from one of my servers. There's only a relatively small number of IPs being hit compared to the number of overall requests.


Today's DDoS stats:

Denied packets: 2,564,918 Unique IPs: 255 Log size: 587M Log throttled: yes Log limits: 20000 allowed within 600 seconds

triuk commented 1 year ago

Hi, yeah the 10/m is too low as my own server kicked me out a few times :P

k0dat commented 1 year ago

@triuk - Regarding your second point with web admin, I might be able to provide some guidance. From your earlier post you mentioned port 8080, so it sounds like you're exposing web admin to the Internet over HTTP? If so, I highly recommend against this. What you need to do is set up a reverse proxy, ideally with HTTPS. I'm using NGINX as a reverse proxy with HTTPS. It's not that hard to set up and I can provide with some of my setup notes if you're interested?

triuk commented 1 year ago

Lets continue in the discussion: https://github.com/noobient/killinuxfloor/discussions/83#discussion-5233177