subutai-io / cdn

Gorjun is a golang replacement for Kurjun project.
Apache License 2.0
19 stars 13 forks source link

Unusual Gorjun cache node behavior #134

Closed akarasulu closed 6 years ago

akarasulu commented 6 years ago

Recording the conditions and experience to figure out what's wrong.

Network Setup

network

default.txt

Network Setup Tests

screen shot 2017-11-12 at 12 25 08 pm

Cache Tests

Newton Cache Tests

newton:~ akarasulu$ md5sum Downloads/management-subutai-template_6.1.3_amd64.tar.gz 1a203eccf77c0e3459212b2978b162d2 Downloads/management-subutai-template_6.1.3_amd64.tar.gz

https://cdn.subut.ai:8338/kurjun/rest/template/download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76&token=e63e329398df96c735685955ffcab30902afa2a1d477dcc2d008c6f45e63203b

newton:~ akarasulu$ wget --no-check-certificate https://cdn.subut.ai:8338/kurjun/rest/template/download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76&token=e63e329398df96c735685955ffcab30902afa2a1d477dcc2d008c6f45e63203b [1] 15128 newton:~ akarasulu$ Redirecting output to ‘wget-log’.

newton:~ akarasulu$ cat wget-log --2017-11-12 12:29:27-- https://cdn.subut.ai:8338/kurjun/rest/template/download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76 Resolving cdn.subut.ai... 18.195.52.172 Connecting to cdn.subut.ai|18.195.52.172|:8338... connected. HTTP request sent, awaiting response... 200 OK Length: 127562644 (122M) Saving to: ‘download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76’

download?id=3e61d36 100%[===================>] 121.65M 2.28MB/s in 53s

2017-11-12 12:30:21 (2.30 MB/s) - ‘download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76’ saved [127562644/127562644]

The MD5 checksums match: #

newton:~ akarasulu$ md5sum download\?id\=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76 1a203eccf77c0e3459212b2978b162d2 download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76

The NGINX access log shows the hit:

172.16.1.10 - - [12/Nov/2017:12:26:52 +0600] "GET /kurjun/rest/template/download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76&token=e63e329398df96c735685955ffcab30902afa2a1d477dcc2d008c6f45e63203b HTTP/1.1" 200 127562644 "https://hub.subut.ai/users/180/kurjun" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"

Fermi Cache Tests

From the browser via Hub the download hit the cache too: with correct md5:

172.16.1.20 - - [12/Nov/2017:12:37:13 +0600] "GET /kurjun/rest/template/download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76&token=a7f784cc5ddb535c5f10ac342f2e862db1de08fb12415d70dea1e14e0ee3b6f3 HTTP/1.1" 200 127562644 "https://hub.subut.ai/users/180/kurjun" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/62.0.3202.62 Chrome/62.0.3202.62 Safari/537.36"

CLI wget also worked and hit the cache:

akarasulu@fermi ~/Downloads $ wget --no-check-certificate https://cdn.subut.ai:8338/kurjun/rest/template/download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76&token=a7f784cc5ddb535c5f10ac342f2e862db1de08fb12415d70dea1e14e0ee3b6f3 [1] 14144 akarasulu@fermi ~/Downloads $ --2017-11-12 12:40:40-- https://cdn.subut.ai:8338/kurjun/rest/template/download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76 Resolving cdn.subut.ai (cdn.subut.ai)... 172.16.1.11 Connecting to cdn.subut.ai (cdn.subut.ai)|172.16.1.11|:8338... connected. WARNING: cannot verify cdn.subut.ai's certificate, issued by ‘emailAddress=akarasulu@gmail.com,CN=cdn.subut.ai,OU=Alex's Home,O=OptDyn,L=New York City,ST=New York,C=US’: Self-signed certificate encountered. HTTP request sent, awaiting response... 200 OK Length: 127562644 (122M) Saving to: ‘download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76’

download?id=3e61d36 100%[===================>] 121,65M 78,1MB/s in 1,6s

2017-11-12 12:40:42 (78,1 MB/s) - ‘download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76’ saved [127562644/127562644]

Checksums match:

akarasulu@fermi ~/Downloads $ md5sum download\?id\=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76 1a203eccf77c0e3459212b2978b162d2 download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76

akarasulu commented 6 years ago

RH Cache Tests

Because the RH is NAT'd on fermi (1.20) the nginx cache shows a wget cache hit from fermi:

172.16.1.20 - - [12/Nov/2017:12:43:14 +0600] "GET /kurjun/rest/template/download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76 HTTP/1.1" 200 127562644 "-" "Wget/1.18 (linux-gnu)"

MD5:

md5sum download\?id\=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76 1a203eccf77c0e3459212b2978b162d2 download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf

When doing the subutai import of management however there is no access showing a cache hit but the md5 matches when looking at the file that is downloaded into lxc tmpdir:

root@stretch:/var/snap/subutai/common/lxc/tmpdir# md5sum management-subutai-template_6.1.3_amd64.tar.gz 1a203eccf77c0e3459212b2978b162d2 management-subutai-template_6.1.3_amd64.tar.gz

The download via subutai import takes about a minute. Seems faster than hitting AWS but much slower than hitting the LAN cache node. The cdn node shows no access to the cache. I'm starting to think this is because it is hitting the local cache node in KG. Here's the output of the import with the -d (debug) option for more information:

root@stretch:/var/snap/subutai/common/lxc/tmpdir# subutai -d import management INFO[2017-11-12 06:53:28] Importing management
DEBU[2017-11-12 06:53:28] Init lock management.import
DEBU[2017-11-12 06:53:28] Locking file management.import
DEBU[2017-11-12 06:53:28] Checking CDN accessibility
DEBU[2017-11-12 06:53:29] Retrieving id, get: https://cdn.subut.ai:8338/kurjun/rest/template/info?name=management&token= DEBU[2017-11-12 06:53:29] Getting kurjun response
DEBU[2017-11-12 06:53:29] Parsing response body
INFO[2017-11-12 06:53:29] Version: 6.1.3
DEBU[2017-11-12 06:53:29] Checking CDN accessibility
DEBU[2017-11-12 06:53:29] Checking Kurjun
DEBU[2017-11-12 06:53:29] Getting owner public key
DEBU[2017-11-12 06:53:29] Reading key body
DEBU[2017-11-12 06:53:29] Reading user public key
DEBU[2017-11-12 06:53:29] Checking signature
INFO[2017-11-12 06:53:29] Template's owner signature verified
DEBU[2017-11-12 06:53:29] Signature belongs to jenkins
DEBU[2017-11-12 06:53:29] Verifying template signature
INFO[2017-11-12 06:53:29] Downloading management
DEBU[2017-11-12 06:53:29] Creating file management-subutai-template_6.1.3_amd64.tar.gz DEBU[2017-11-12 06:53:29] Getting https://cdn.subut.ai:8338/kurjun/rest/template/subutai/management-subutai-template_6.1.3_amd64.tar.gz?token= DEBU[2017-11-12 06:53:29] Writing response body to file
DEBU[2017-11-12 06:53:29] Creating file management-subutai-template_6.1.3_amd64.tar.gz DEBU[2017-11-12 06:53:29] Getting https://cdn.subut.ai:8338/kurjun/rest/template/jenkins/management-subutai-template_6.1.3_amd64.tar.gz?token= DEBU[2017-11-12 06:53:29] Writing response body to file
DEBU[2017-11-12 06:53:29] Creating file management-subutai-template_6.1.3_amd64.tar.gz DEBU[2017-11-12 06:53:29] Getting https://cdn.subut.ai:8338/kurjun/rest/template/docker/management-subutai-template_6.1.3_amd64.tar.gz?token= DEBU[2017-11-12 06:53:29] Writing response body to file
DEBU[2017-11-12 06:53:29] Creating file management-subutai-template_6.1.3_amd64.tar.gz DEBU[2017-11-12 06:53:29] Getting https://cdn.subut.ai:8338/kurjun/rest/template/download?id=3e61d36b-3e2d-42df-a95c-0c0e1a76bf76&token= 113.91 MiB / 121.65 MiB [=======================================>--] 93.63% 3s 121.41 MiB / 121.65 MiB [=============================================] 99.80%DEBU[2017-11-12 06:54:29] Writing response body to file
121.65 MiB / 121.65 MiB [=========================================] 100.00% 59s INFO[2017-11-12 06:54:29] File integrity verified
INFO[2017-11-12 06:54:29] Unpacking template management
DEBU[2017-11-12 06:54:29] /var/snap/subutai/common/lxc/tmpdir/ management-subutai-template_6.1.3_amd64.tar.gz to management DEBU[2017-11-12 06:54:30] Extracting tgz
DEBU[2017-11-12 06:54:30] Getting BTRFS subvolume readonly property
INFO[2017-11-12 06:54:30] Installing template management
DEBU[2017-11-12 06:54:30] Getting BTRFS subvolume list
DEBU[2017-11-12 06:54:30] Creating subvolume /var/snap/subutai/common/lxc/management: Create subvolume '/var/snap/subutai/common/lxc/management' DEBU[2017-11-12 06:54:31] Receiving delta receive /var/snap/subutai/common/lxc/management -f /var/snap/subutai/common/lxc/tmpdir/management/deltas/var.delta -p /var/snap/subutai/common/lxc/openjre16/var DEBU[2017-11-12 06:54:31] Receiving delta receive /var/snap/subutai/common/lxc/management -f /var/snap/subutai/common/lxc/tmpdir/management/deltas/rootfs.delta -p /var/snap/subutai/common/lxc/openjre16/rootfs DEBU[2017-11-12 06:54:32] Receiving delta receive /var/snap/subutai/common/lxc/management -f /var/snap/subutai/common/lxc/tmpdir/management/deltas/home.delta -p /var/snap/subutai/common/lxc/openjre16/home DEBU[2017-11-12 06:54:32] Receiving delta receive /var/snap/subutai/common/lxc/management -f /var/snap/subutai/common/lxc/tmpdir/management/deltas/opt.delta -p /var/snap/subutai/common/lxc/openjre16/opt DEBU[2017-11-12 06:54:32] Opening file /var/snap/subutai/common/lxc/tmpdir/management/config DEBU[2017-11-12 06:54:32] Creating file /var/snap/subutai/common/lxc/management/config DEBU[2017-11-12 06:54:32] Copying file /var/snap/subutai/common/lxc/tmpdir/management/config to /var/snap/subutai/common/lxc/management/config DEBU[2017-11-12 06:54:32] Opening file /var/snap/subutai/common/lxc/tmpdir/management/fstab DEBU[2017-11-12 06:54:32] Creating file /var/snap/subutai/common/lxc/management/fstab DEBU[2017-11-12 06:54:32] Copying file /var/snap/subutai/common/lxc/tmpdir/management/fstab to /var/snap/subutai/common/lxc/management/fstab DEBU[2017-11-12 06:54:32] Opening file /var/snap/subutai/common/lxc/tmpdir/management/packages DEBU[2017-11-12 06:54:32] Creating file /var/snap/subutai/common/lxc/management/packages DEBU[2017-11-12 06:54:32] Copying file /var/snap/subutai/common/lxc/tmpdir/management/packages to /var/snap/subutai/common/lxc/management/packages DEBU[2017-11-12 06:54:32] Removing temp dir /var/snap/subutai/common/lxc/tmpdir/management DEBU[2017-11-12 06:54:32] Setting readonly: false:
DEBU[2017-11-12 06:54:32] Setting readonly: false:
DEBU[2017-11-12 06:54:32] Setting readonly: false:
DEBU[2017-11-12 06:54:32] Setting readonly: false:
DEBU[2017-11-12 06:54:32] Closing database
DEBU[2017-11-12 06:54:32] Opening container config /var/snap/subutai/common/lxc/management/config DEBU[2017-11-12 06:54:32] Closing container configuration file
DEBU[2017-11-12 06:54:32] Reading container rootfs stat
DEBU[2017-11-12 06:54:32] uidmapshift rootfs
DEBU[2017-11-12 06:54:32] uidmapshift home
DEBU[2017-11-12 06:54:32] uidmapshift opt
DEBU[2017-11-12 06:54:32] uidmapshift var
DEBU[2017-11-12 06:54:32] Generating random mac
DEBU[2017-11-12 06:54:32] Opening container config /var/snap/subutai/common/lxc/management/config DEBU[2017-11-12 06:54:32] Closing container configuration file
DEBU[2017-11-12 06:54:32] Writing default key ident
DEBU[2017-11-12 06:54:32] Writing defaults for gpg
DEBU[2017-11-12 06:54:32] Closing defaults for gpg
DEBU[2017-11-12 06:54:33] Generating key
DEBU[2017-11-12 06:54:33] Writing resolv.conf.orig
DEBU[2017-11-12 06:54:33] Writing resolv.conf.tail
DEBU[2017-11-12 06:54:33] Writing resolv.conf
DEBU[2017-11-12 06:54:33] Opening database
DEBU[2017-11-12 06:54:33] Writing container data to database
DEBU[2017-11-12 06:54:33] Closing database
DEBU[2017-11-12 06:54:33] Creating container object
DEBU[2017-11-12 06:54:33] Starting LXC container management, starting the container failed DEBU[2017-11-12 06:54:33] Opening database
DEBU[2017-11-12 06:54:33] Writing container data to database
DEBU[2017-11-12 06:54:33] Closing database
DEBU[2017-11-12 06:54:33] Exposing port 8443
DEBU[2017-11-12 06:54:33] Exposing port 8444
DEBU[2017-11-12 06:54:33] Exposing port 8086
DEBU[2017-11-12 06:54:33] Opening database
DEBU[2017-11-12 06:54:33] Writing container data to database
DEBU[2017-11-12 06:54:33] Closing database
INFO[2017-11-12 06:54:33] ****
DEBU[2017-11-12 06:54:33] Running route command
INFO[2017-11-12 06:54:33] Subutai Management UI will be shortly available at https://192.168.121.30:8443 INFO[2017-11-12 06:54:33] login: admin
INFO[2017-11-12 06:54:33] password: secret
INFO[2017-11-12 06:54:33] ****

akarasulu commented 6 years ago

Just to test I turned off the LAN cache and tried the subutai import again and it came down at the same speed. I suspect this is coming from the office cache node. However its horrible that's there's no way I can confirm that without opening wireshark. Again it is way faster than hitting AWS but way slower than the LAN.

NOTE it seems none of the other container templates that management depends on such as openjre16 or ubuntu16 result in cache hits. These are probably also downloading from the local in country cache node instead of from the LAN node.

It would also be nice if we can see exactly what the agent CLI is resolving in terms of addresses when using the -d switch. I think then we can figure out where it's going.

akarasulu commented 6 years ago

OK it seems something might have been messed up with DNS on the KVM VM of the RH. I now flushed it and verified that everything is clean in the dns cache and now with a management import I am getting the following error:

root@stretch:/var/snap/subutai/common/lxc/tmpdir# subutai import management INFO[2017-11-12 07:39:14] Importing management
ERRO[2017-11-12 07:39:14] Retrieving id, get: https://cdn.subut.ai:8338/kurjun/rest/template/info?name=management&token=, Get https://cdn.subut.ai:8338/kurjun/rest/template/info?name=management&token=: x509: certificate signed by unknown authority

After waiting a while it starts to work oddly:

root@stretch:/var/snap/subutai/common/lxc/tmpdir# subutai import management INFO[2017-11-12 07:41:04] Importing management
INFO[2017-11-12 07:41:05] Version: 6.1.3
INFO[2017-11-12 07:41:05] Template's owner signature verified
INFO[2017-11-12 07:41:05] Downloading management
57.30 MiB / 121.65 MiB [===================>----------------------] 47.10% 28s

But this is now going to the local in country cache node without a cache hit on the nginx cache node in the LAN. I have no idea how this is switching.

akarasulu commented 6 years ago

So inside the RH I pinged to show that cdn.subut.ai resolves to the cache node 1.11 and then did the import and it still goes to the local country cache node in the office without hitting the LAN cache node:

root@stretch:/var/snap/subutai/common/lxc/tmpdir# rm management-subutai-template_6.1.3_amd64.tar.gz root@stretch:/var/snap/subutai/common/lxc/tmpdir# subutai destroy management INFO[2017-11-12 08:06:20] management is destroyed
root@stretch:/var/snap/subutai/common/lxc/tmpdir# ping cdn.subut.ai PING cdn.subut.ai (172.16.1.11) 56(84) bytes of data. 64 bytes from cdn-cache.upstairs (172.16.1.11): icmp_seq=1 ttl=63 time=1.45 ms 64 bytes from cdn-cache.upstairs (172.16.1.11): icmp_seq=2 ttl=63 time=1.36 ms 64 bytes from cdn-cache.upstairs (172.16.1.11): icmp_seq=3 ttl=63 time=1.17 ms ^C --- cdn.subut.ai ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 1.179/1.334/1.456/0.123 ms root@stretch:/var/snap/subutai/common/lxc/tmpdir# subutai import managementINFO[2017-11-12 08:06:35] Importing management
INFO[2017-11-12 08:06:35] Version: 6.1.3
INFO[2017-11-12 08:06:36] Template's owner signature verified
INFO[2017-11-12 08:06:36] Downloading management
91.78 MiB / 121.65 MiB [===============================>----------] 75.44% 13s