lancachenet / monolithic

A monolithic lancache service capable of caching all CDNs in a single instance
https://hub.docker.com/r/lancachenet/monolithic
Other
728 stars 73 forks source link

lancache timeouts, low download rates (solution) #80

Closed decryptedchaos closed 4 years ago

decryptedchaos commented 4 years ago

Okay, this is not a issue per-say, rather this is the best way i could find to share my findings as i have read may issues which have these problems so i wanted to contribute my solution.

I have been having a huge amount of issues with lancache.

At first i was having a lot of problems with DNS-timeouts on lancache-dns, this problem i solved by implementing my own unbound resolver this is much more stable and i wrote a script to update the hosts.

But even then, i was having a very hard time getting lancache to function in its core, i would open steam, it would start a download and peek at max ISP speed for about 10-30 seconds then it would basically flat line and do no further data transmission.

I fought and fought with this, and eventually i figured out that by default docker containers on the normal bridge interface (docker0) do NOT have internet access.

So what i did was change my docker container to host network, and changed the nginx listen option to only listen on the IP that i want it to run on. this gives the container more direct access to the network thus allows it to have outbound connectivity without any problems.

After making this change my lancache server operates at full ISP speed for uncached data, and 500+Mbps for cached data

malteserr commented 4 years ago

This sounds amazing @decryptedchaos

Is it possible to kindly request some step by step instructions for docker noobs like myself, or what should be added/changed from the default readme instructions to be able to test this ourselves?

decryptedchaos commented 4 years ago

I have modified the guide on my fork, Which you can read in entirety if you like but the relevant parts that you are looking for are as follows.

I am still working on the guides for the DNS setup which could also benefit from this modification.

docker run --restart unless-stopped --name lancache --network host --detach -v /cache/data:/data/cache -v /cache/logs:/data/logs lancachenet/monolithic:latest

You will notice, that there are no ports specified in this command, that's because we have changed the container to the 'host' network.

Host network configuration (Workaround)

In my findings it is better to use host network for the lancache server rather than port mapping because by default docker bridge does not NAT internet traffic to docker hosts, one could argue that we could simply modify the docker bridge network to NAT traffic, However, why re-invent the wheel? We can get direct network access for this container by using host network

docker exec -it lancache /bin/bash

You will drop to a prompt

root@lancache:/scripts#

From here the setup is relatively straight forward.

First lets install nano for simplicity sake.

apt -y install nano

nano /etc/nginx/sites-available/10_generic.conf

At the top of this file you will see the listen line

listen 80 reuseport;

We need to change this, Where the normal guide would tell you to define a host IP and port to "Map" to the container we now have direct access to the IPs on the host, so we can bind to it directly. In my case i use the local IP 192.168.1.40 for my lancache server (yours will be different). So i set this:

listen 192.168.1.40:80 reusport;

ctrl+x to save

service nginx restart

Your lancache server will now have full network access and should not timeout anymore.

MathewBurnett commented 4 years ago

I have never seen anything like your issue before. We have never run into this at LANs or in our homes so i am fascinated to learn more about your use case. I wonder if you could provide a few details to help me understand your network and hardware. 1) how is DNS provided to your network? (what gives clients a DNS server)? 2) What physical machine is your lancache on and what dns does that host use? 3) Can your lancache host resolve google.com? (what does an nslookup tell you?)

decryptedchaos commented 4 years ago

No offense intended, But you say you have not seen this issue before yet there are at least 3 closed issues that fit this scenario with no resolution.

To answer your question, by default fresh docker-cli install creates a bridge interface docker0 with a 172.x.x.x subnet wit no NAT.

When you install the lancache container and it gets placed on the bridge network by default this creates a firewall port forward from the host to the container. this works just fine for for a simple one directional service such as static web page or to be more precise a web service that does not need to access the outside world to operate.

So this is what is happening, the monolithic container starts up binds to its internal 172.x.x.x ip and the hosts firewall forwards port 80 to that IP this lets LAN clients access the container.

But because there is no NAT on this 172.x.x.x subnet the containers do NOT have access to the internet, regardless of what DNS servers get set is irrelevent because the container at this point has no route to host, no gateway, no route path back to the internet. and thus ping google.com fails ping steampowered.com fails ping 8.8.8.8 fails or any other internet IP/FQDN because it simply has no way to get beyond its own subnet.

The story is much the same for lancache-dns it will resolve local hosts i.e the monolithic hosts it is overriding because they are looked up in the local system, but if it tries to lookup any remote address it times out because it can't access the internet due to no route to host.

Like i said in my write-up, there are probably a few ways to solve this, but i propose that its probably a good idea not to rely on the hosts firewall to pass the traffic to the monolithic container as this is a very high bandwidth application, at some point you may hit a bottleneck on iptables.

MathewBurnett commented 4 years ago

No offence taken. Allow me to rephrase. Of the maybe 50 setups that i have either done or physically spoken to, i have yet to experience this issue. Most of them have had reasonably common install procedure.

What is often uncommon is the networks DNS setup and hence that is where i had focussed my questions. I'm trying to establish what differs about your install that presents this problem. Are you using an OS i'm not expecting, running virtualised in a manner i'm not understanding or perhaps installing docker from a method that i don't use my self?

Without understanding its very hard to recognise others as having an experience that matches yours.

decryptedchaos commented 4 years ago

I understand.

In this respect, i am not sure how it differs, it is a standard Debian system with the official docker apt repo. It is installed in the traditional way.

I only discovered the problem when i tried to access external networks from containers running on the system and i realized that the containers have no gateway access, then i realized that if the containers have no gateway access that would explain all of the problems i am having with lancache.

I have no doubt that other users are experiencing the same issue they just don't know it.

If we had a known working system to test we could figure out how the docker bridge is suppose to pass outgoing traffic. e.g is it using NAT or is it some other clever form of routing.

Docker is not my forte, i use it enough to do basic troubleshooting to it, but its never been a true virtual environment so its not my system of choice.

entity53 commented 4 years ago

@decryptedchaos
Any progress on the unbound write up? Looking to get this up soon, and have never used unbound before.

decryptedchaos commented 4 years ago

@entity53 Sorry for the delay.

I have updated my fork to reflect the instructions for unbound.

I will put them here so anyone else who wants to try it can play with it.

Please let me know if it works, as i may have missed a step.

Manual Unbound DNS (ALTERNITIVE TO LANCACHE-DNS)

You will need a VM (Docker, Xen, LXC) I choose to run a LXC container with CentOS

yum install git
yum install jq
yum install unbound
git clone https://github.com/uklans/cache-domains.git
cd cache-domains/

cd scripts/

mv config.example.json config.json

Change IP for all entries to your LANcache IP

nano config.json

Now we need to make a few simple edits to the script.

nano create-unbound.sh 

Make your settings match the , this way when the script runs it will automatically output to the right config directory under /etc/unbound/local.d/ When unbound starts it parses all files in this directory.

basedir=".."
outputdir="/etc/unbound/local.d/"
path="${basedir}/cache_domains.json"

Now just run the script, it will create the required zone files and put them in the correct place

./create-unbound.sh

Now Restart unbound and when you query any record for lancache unbound will answer with the LANcache IP.

systemctl restart unbound
systemofapwne commented 4 years ago

I'm not completely aware about your setup so I do not know, if my problem is even related to yours or can be adapted. But I have found a solution to my problems.

My situation (similar to OPs one):

  1. Running "/scripts/cache_test.sh" in a freshly started container works
  2. Whenever I download a game, it initailly downloads very fast with max ISP Speed
  3. I see, that the caching directory of my lancache increases
  4. I initially see log entries of the lancache which apruptly stop at some point and never recover
  5. Now, download speeds drop to an unholy < "some kB/s"
  6. Manually requesting already cached content (see logfiles for that) also fails now
  7. Running "/scripts/cache_test.sh" in the lancache container also fails
  8. After some BIG wating-time, download speeds recover (here: steam), but no new log entries are generated for lancache -> I think, steam fallsback here in order to still get the gamedate, even though primary CDNs are not reachable anymore

My setup

My solution

I do not know, if PID limits are the culprit yet, but I highly suspect it, since I have seen similar issues on other containers too, when exceeding maximum PIDs. I will report back, whenever I know more.

unspec commented 4 years ago

Regarding slow initial blizzard downloads:

The latest version of monolithic/generic now supports changing the slice size used by nginx. We've found that increasing from 1m to 8m offers a small performance boost to specific use cases (single user initial downloads of blizzard games in particular). See http://lancache.net/docs/advanced/tuning-cache/#tweaking-slice-size for information on how to make use of this.

Please note that it does come with some potential downsides (discussed in the above link) and will invalidate any already cached data on your cache if you change the value.

To tidy up the issues, if you choose to test this please post any feedback on this issue: https://github.com/lancachenet/generic/issues/100

If you need any other support please see http://lancache.net/support/ or open a new issue.

unspec commented 4 years ago

Closing this issue due to inactivity, feel free to reopen if needed