lancachenet / monolithic

A monolithic lancache service capable of caching all CDNs in a single instance
https://hub.docker.com/r/lancachenet/monolithic
Other
757 stars 77 forks source link

Slow download speeds during cache warmup #16

Closed spyfly closed 4 years ago

spyfly commented 5 years ago

Describe the issue you are having

My Steam Download Speeds through the cache are stuck at around 5-6 MBps, whilst without the cache I would be hitting around 10-11 MBps. I have tried adding a decent amount of IPs, as pointed out in the README, but that hasn't changed anything at all. I am wondering whether I have done something wrong in the configuration of the docker containers or if there is an issue with steamcache.

The Steamcache is running in a KVM Container with 16 GB of RAM with 4 Broadwell Cores off a RAID0 ZFS Array dedicated to it, not hitting any IO Limitations. When pulling already cached data from the Steam Cache, I am getting around 80 MBps throughput with 2 Clients.

How are you running the container(s)?

HOST_IP=`hostname -I | cut -d' ' -f1`
sudo docker run --restart unless-stopped --name steamcache-dns --detach -p $HOST_IP:53:53/udp -e USE_GENERIC_CACHE=true -e LANCACHE_IP=$HOST_IP -e STEAMCACHE_IP="192.168.30.201 192.168.30.202 192.168.30.203 192.168.30.204 192.168.30.205 192.168.30.206 192.168.30.207 192.168.30.208 192.168.30.209 192.168.30.210 192.168.30.211 192.168.30.212 192.168.30.213 192.168.30.214 192.168.30.215 192.168.30.216 192.168.30.217 192.168.30.218 192.168.30.219 192.168.30.220" steamcache/steamcache-dns:latest
sudo docker run --restart unless-stopped --name lancache --detach -v /cache/data:/data/cache -v /cache/logs:/data/logs -p 80:80 -e CACHE_MEM_SIZE=16000m -e CACHE_DISK_SIZE=800g steamcache/monolithic:latest
sudo docker run --restart unless-stopped --name sniproxy --detach -p 443:443 steamcache/sniproxy:latest
echo Please configure your dhcp server to serve dns as $HOST_IP

DNS Configuration

Configured pfsense to point to steamcache-dns

IP Configuration

root@caching-server:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 1a:f5:5d:8d:81:e6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.30.201/24 brd 192.168.30.255 scope global ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.202/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.203/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.204/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.205/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.206/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.207/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.208/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.209/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.210/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.211/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.212/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.213/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.214/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.215/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.216/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.217/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.218/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.219/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet 192.168.30.220/24 brd 192.168.30.255 scope global secondary ens18
       valid_lft forever preferred_lft forever
    inet6 2001:16b8:55c1:2900:18f5:5dff:fe8d:81e6/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 86385sec preferred_lft 14385sec
    inet6 fe80::18f5:5dff:fe8d:81e6/64 scope link 
       valid_lft forever preferred_lft forever
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:40:55:ba:29 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:40ff:fe55:ba29/64 scope link 
       valid_lft forever preferred_lft forever
5: vethf943db7@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 52:5e:9e:05:60:64 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::505e:9eff:fe05:6064/64 scope link 
       valid_lft forever preferred_lft forever
7: veth568c974@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether ee:3f:b4:37:ba:ab brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::ec3f:b4ff:fe37:baab/64 scope link 
       valid_lft forever preferred_lft forever
9: vethdd6acbd@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 9e:06:e9:a3:f0:de brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::9c06:e9ff:fea3:f0de/64 scope link 
       valid_lft forever preferred_lft forever
spyfly commented 5 years ago

I've also tried steamcache/generic with latest, noslice and proxy_read_timeout tag, none of them really made a difference, I was still getting around 5-6 MB/s download through the cache, on the first download.

I have also tried it on my own machine, same result.

pkloodt commented 5 years ago

Same issue here. I switched from steamcache/steamcache to steamcache/monolithic because in the issues section here in github in steamcache/steamcache they stated those issues should be solved with the monolithic container but obviously it's not fixed.

I'm on a 400/25 MBit/s Internet connection without cache getting download speeds around ~40-45MB/s, with cache I'm stuck between 4-5MB/s.

I'm running on Ubuntu 18.04 on a Intel Pentium G4560 with 6 Disks Raid10 ZFS, also not running into a bottleneck here...

Any thoughts?

spyfly commented 5 years ago

@pkloodt Have you tried adding multiple IPs, as described in the README under Tuning your Cache?

I have added up to 20 IPs and it didn't make a difference for me.

entity53 commented 5 years ago

Are you guys seeing a lot of timeouts / remote disconnections in the nginx error log ?

spyfly commented 5 years ago

@entity53 Everything seems to be fine, nothing unexceptional in the error.log

entity53 commented 5 years ago

@spyfly
I believe this seemed to fix the issue after I also added 20IPs. Try adding this line to the nginx conf files in the monolithic container. (if you need step by step let me know)

/etc/nginx/sites-available/generic.conf.d/root/20_cache.conf

   proxy_set_header Connection "";

and then reload or restart nginx inside the container

spyfly commented 5 years ago

@entity53 that sounds like excellent news. I'll give it a try later this evening or tomorrow.

malteserr commented 5 years ago

@entity53 Would definitely appreciate a step by step if this is helping :) some of us are hopeless at docker

entity53 commented 5 years ago

first, get to the cmd prompt of your running monolithic container :

sudo docker exec -i -t monolithic /bin/bash (where monolithic is the name of your steamcache-monolithic container)

you should then be at /scripts inside your container.

then into this file:

/etc/nginx/sites-available/generic.conf.d/root/20_cache.conf

using the editor of your choice insert the line :

proxy_set_header Connection "";

save and exit

then restart nginx inside the container by typing service nginx restart

afterwards , type exit to leave the container cmd line.

spyfly commented 5 years ago

@entity53 So with 4 IPs added it does not have any effect, apart from reducing my throughput to around 4-5 MB/s from 6 MB/s before. What kind of connection speed do you have?

Adding 6 more IPs did get me my old speed of around 6MB/s again, adding 10 more didn't make a difference either, so this setting didn't really make much of a difference.

What I did figure out though is the following:

When reloading or restarting nginx whilst an active download would be running, the download speed would go up to 8-9 MB/s rather than the before mentioned 5-6 MB/s, which seems interesting.

entity53 commented 5 years ago

I tinkered with the settings for a quite a while last week and it was finally the one I posted that let me get about 80-90% of my full pipe, but it sounds like it could have been a combinations of settings that got me there. I'll do a diff tonight to see what else I changed that might have helped.

spyfly commented 5 years ago

@entity53 alright, that would be amazing.

entity53 commented 5 years ago

Ok, here are all the changes I made.

Apart from this, I did do some work on the networking settings on the host machine itself (not inside the docker container), primarily because it is a 10 GB ethernet card that by default has settings that will cause a kernel panic.

The most crucial part being disabling LRO and GRO while routing / bridging ethtool -K gro off ethtool -K lro off

Additional info on that card if needed: https://github.com/paul-chambers/aquantia

WARNING: The AQtion driver compiles by default with the LRO (Large Receive Offload) feature enabled. This option offers the lowest CPU utilization for receives, but is completely incompatible with routing/ip forwarding and bridging. If enabling ip forwarding or bridging is a requirement, it is necessary to disable LRO using compile time options as noted in the LRO section later in this document. The result of not disabling LRO when combined with ip forwarding or bridging can be low throughput or even a kernel panic.

Finally, Each time I added additional IP addresses, I made sure to do a full DNS flush on the client machine. For some reason, on the windows machines, doing a ipconfig /flushdns was NOT enough to get it to pick up the additional IP addresses being served by the DNS. I did the flush and THEN did a full restart on the machine.

Afterwards doing a nslookup through the cache machine would return the appropriate list:

nslookup steamcontent.com Server: DNS SERVER IP Address: DNS SERVER IP#53

Non-authoritative answer: steamcontent.com canonical name = steam.cache.steamcache.net. Name: steam.cache.steamcache.net Address: 192.168.7.187 Name: steam.cache.steamcache.net Address: 192.168.7.188 Name: steam.cache.steamcache.net Address: 192.168.7.189 Name: steam.cache.steamcache.net Address: 192.168.7.190

entity53 commented 5 years ago

Also to note,
The changes I made did make a huge instant difference when 1 client requested a download (we got to around 75% of our normal, non cache, download ) however we weren't ever able to quite push the full internet pipe until we had a few people requesting the same game. Once there was one additional client caught up to near where 'first' person was in terms of DL %, it climbed higher.

I suspect it may have something to do with the timing of various threads/slices downloading, and a client not requesting a new 'batch' until what it had requested had finished. This led to periods where the total throughput would fall from 100% to 75% for a few seconds before saturating again.

Not 100% verified but this is what I observed anecdotally.

spyfly commented 5 years ago

Also to note, The changes I made did make a huge instant difference when 1 client requested a download (we got to around 75% of our normal, non cache, download ) however we weren't ever able to quite push the full internet pipe until we had a few people requesting the same game. Once there was one additional client caught up to near where 'first' person was in terms of DL %, it climbed higher.

I suspect it may have something to do with the timing of various threads/slices downloading, and a client not requesting a new 'batch' until what it had requested had finished. This led to periods where the total throughput would fall from 100% to 75% for a few seconds before saturating again.

Not 100% verified but this is what I observed anecdotally.

Well, yeah those settings didn't make much of a difference afaik. The download speeds seem to be between 5 - 6.8 MB/s which is around 50-60% of my line speed.

I am using Asus XG-C100C Network Cards in all my 10 GBit Clients, which are all Aquantia AQtion AQC107 based. Never really had any problems with them, they work as intended using the Linux Kernel Driver.

I will drop in a Intel Chipset based 10 Gig card into my primary server though, but I doubt that is going to make much of a difference.

sir-vince commented 5 years ago

Just to pitch in hopes that someone might be inspired to a solution. Ps. read this with a bit of skepticism as it is about 5 years ago I stopped to work with systems administration :)

The test hardware is as follows: Dell R720 2 Xeon E7- 4870 256GB memory 2 Intel 750 1,2 TB in raid 0 Intel X520 10g interface

Start command for lancache: docker run --restart unless-stopped --ulimit nofile=64000:64000 --name lancache --detach -v /cache/data:/data/cache -v /cache/logs:/data/logs --net=host -e CACHE_MEM_SIZE=200000m -e CACHE_DISK_SIZE=2000g --ulimit nproc=16386:32768 -p 80:80 steamcache/monolithic:latest

The first test with a clean Ubuntu 18.10 gave very unstable performance ranging from 5MB/s to 30MB/s across both Steam and Origin from one client. When a second client started to download a game, the speed dropped for both clients.

Observation is using find /cache/ -type f |wc -l that in the /cache directory I have 104401 files.

Based on that, I started tinkering with ulimit (add --ulimit nofile=64000:64000 to the start parameter) as well as raising ulimit for the docker service in systemd. I don't know if it is just luck or placebo but after that change in Steam I got speeds around 80 MB/S and in Origin got speeds around 700MB/S.

Yet, still I experience the throughput drops drastically down to ~10-20 MB/S when a second client starts downloading something from either Steam or Origin making me think it might be a problem with handling the high number of cache files created by nginx.

entity53 commented 5 years ago

Yet, still I experience the throughput drops drastically down to ~10-20 MB/S when a second client starts downloading something from either Steam or Origin making me think it might be a problem with handling the high number of cache files created by nginx.

Is the second client downloading the same game or a different game?

How many alias IPs have you added to the network card / steamcache-dns

sir-vince commented 5 years ago

@entity53 I have tested the following scenarios, each caused the performance to drop drastically. Both clients downloading the same game from steam (Hitman Absolution). Both clients downloading the same game from steam (Counter-Strike Global Offensive) One client downloading Apex from Origin and one client downloading Hitman Absolution from Steam

I do not have a test where both clients downloaded the same game from Origin

If there is a desired scenario you wish tested, feel free to describe it.

Update: Forgot to mention the number of ips. I have added 10 ips to the steamcache-dns / network card

spyfly commented 5 years ago

I have upgraded my internet line recently, now it manages to hit ~18 MB/s peak instead of 12 MB/s without Steam Cache.

When running Steam Cache with 20 IPs, I am only getting around 5 - 9 MB/s download speed on cache warmup

carroarmato0 commented 5 years ago

I ran some tests myself and also experience the same issue. I made a list with different attempts:

Without cache:

(Epic) Fortnite: ~22MBps
(Epic) Jackbox Party: ~22MBps
(Steam) Bioshock: ~22MBps
(Steam) Borderlands: ~22MBps
(Blizzard) Overwatch: ~22MBps
(Blizzard) WoW: ~22MBps
(Blizzard) Diablo 3: ~22MBps
(Origin) Dead Space: ~22MBps
(Origin) Apex Legends: ~22MBps

With cache (1 IP):

(Epic) Fortnite: ~20MBps
(Epic) Jackbox Party: ~17MBps
(Steam) Bioshock: ~15MBps
(Steam) Borderlands: ~13MBps
(Blizzard) Overwatch: ~2.5MBps
(Bizzard) WoW: ~2MBps
(Blizzard) Diablo 3: ~3MBps
(Origin) Dead Space: ~7.5MBps
(Origin) Apex Legends: ~8MBps

With cache (11ips):

(Epic) Fortnite: ~17MBps
(Epic) Jackbox Party: ~12MBps
(Steam) Bioshock: ~20MBps
(Steam )Borderlands: ~20MBps
(Blizzard) Overwatch: ~8,5MBps
(Blizzard) WoW: ~9MBps
(Blizzard) Diablo 3: ~8MBps
(Origin) Dead Space: ~5MBps
(Origin) Apex Legends: ~5MBps

While downloading a game, say Borderlands at an average speed of 11MBps according to Steam, when I look at the traffic from inside the cache container is see that it's doing ~12MBps TX and ~12MBps RX

manafoo commented 5 years ago

@spyfly I believe this seemed to fix the issue after I also added 20IPs. Try adding this line to the nginx conf files in the monolithic container. (if you need step by step let me know)

/etc/nginx/sites-available/generic.conf.d/root/20_cache.conf

   proxy_set_header Connection "";

and then reload or restart nginx inside the container

can you please make step by step how to do that?

andriesinze commented 5 years ago

I literally tried every option above - but I cannot get the initial warmup to go over anything above 15MBs (we are on 1Gbit line).

Tried

keepalive_timeout 300;

proxy_http_version 1.1; proxy_set_header Connection "";

worker_processes auto;

Added 5 IP's in total, they are all returned in the DNS lookup.

Warm cache download tops at 850mbps, so that is good.

This is my startup command: HOST_IP=hostname -I | cut -d' ' -f1 docker run --restart unless-stopped -d --name lancache-dns -p 53:53/udp -e USE_GENERIC_CACHE=true -e LANCACHE_IP=$HOST_IP -e STEAMCACHE_IP="10.10.11.10 10.10.11.11 10.10.11.12 10.10.11.13 10.10.11.14 10.10.11.15" lancachenet/lancache-dns:latest docker run --restart unless-stopped -d --name lancache --detach -v /lancache:/data/cache -v /cache/logs:/data/logs -p 80:80 -e CACHE_MEM_SIZE=5000m -e CACHE_DISK_SIZE=1000g lancachenet/monolithic:latest docker run --restart unless-stopped -d --name sniproxy --detach -p 443:443 lancachenet/sniproxy:latest echo Please configure your dhcp server to serve dns as $HOST_IP

DragonQ commented 4 years ago

I have the same issue. My internet connection isn't even fast (80 Mb/s) but I'm not maxing it for uncached games when using LANCache. Speeds when using Steam are as follows:

Without LANCache: ~9 MB/s LANCache uncached: 0.2-5 MB/s (varies a lot, can be under 1 MB/s for several minutes then shoot to 5 MB/s) LANCache cached: 25-40 MB/s

Considering I can easily max out the gigabit link between my server and desktops with regular file transfers, this leaves a lot to be desired. The cached performance is certainly better than downloading from the internet but less than half of what I'd expect. The uncached performance though is particularly bad, making the use of LANCache counterproductive since we only have 3 desktops that we want to use this with.

My docker containers are running on a VM (since ports 80 and 443 are being used on my host). The cache and log directories are NFS shares on the host. The VM doesn't show any obvious performance bottlenecks (CPU usage never goes above 10% during downloads, RAM never above 256 MB) and both iperf and file transfers run at full speeds.

EDIT: I tried the multiple IPs thing by giving my VM more IPs but pinging out from the VM to the internet broke because of that. Not sure if there's some extra config that needs to be done with that kind of setup. However, I don't see why having multiple IPs would affect a single user downloading so I'm going to leave this avenue of debug for now.

What I have discovered is that using NFS shares is a bad idea: they slow things down a lot. Since switching to a VirtIO mapped directory, my numbers are more like this now:

Without LANCache: ~9 MB/s LANCache uncached: 5-8 MB/s (pretty steady most of the time around 7 MB/s) LANCache cached: 25-40 MB/s

So it's still slower than without LANCache but not horrendous. I might try using a qcow2 filesystem to see how that compares to a mapped directory too.

agassparkle commented 4 years ago

first, get to the cmd prompt of your running monolithic container :

sudo docker exec -i -t monolithic /bin/bash (where monolithic is the name of your steamcache-monolithic container)

you should then be at /scripts inside your container.

then into this file:

/etc/nginx/sites-available/generic.conf.d/root/20_cache.conf

using the editor of your choice insert the line :

proxy_set_header Connection "";

save and exit

then restart nginx inside the container by typing service nginx restart

afterwards , type exit to leave the container cmd line.

Hello,

what should i do when i get permission denied when i do "/etc/nginx/sites-available/generic.conf.d/root/20_cache.conf" ?

Angelayes commented 4 years ago

Try to delete cache in steam client, it fixed it for me :)

got from 14mb to around 80mb

spyfly commented 4 years ago

85 has resolved this issue. Steam now downloads without any issues at line speed during cache warmup.