nlf / dlite

The simplest way to use Docker on OS X
MIT License
2.34k stars 54 forks source link

Version 2.0.0 #135

Closed nlf closed 8 years ago

nlf commented 8 years ago

Version 2.0.0 of DLite is ready for testing!

Here's what you can do to help:

First, remove your old installation of DLite

dlite stop
sudo dlite uninstall
sudo nfsd stop #unless you're using nfsd outside of DLite

You'll also want to edit /etc/exports and remove the entry that DLite created

Build from the latest code in the master branch (copy the binary to your path if you want. if you installed with homebrew before you'll want to brew uninstall dlite first) or download the latest pre-release binary on the releases page and install passing the -v flag like so:

sudo dlite install -v 3.0.0-beta4

After the installation completes, run dlite start and wait a minute or so. If your internet connection is slow and the version of docker requested in your config is not 1.10.2 it will take longer since on the first boot the docker binary gets downloaded, and it's over 30MB. You'll know it's done and running when docker ps works.

Please report any issues you have here and I'll work to get them fixed up before the official release. Thanks!

Edit: You're also welcome to join the gitter for questions or just to say hi

antoniocanas commented 8 years ago

@nlf Yes there is

./dlite stop
Stopping the agent: done

~/tmp
❯ hdiutil info
framework       : 417.1
driver          : 10.11v417.1
================================================
image-path      : /Users/antonio/.dlite/disk.sparseimage
image-alias     : /Users/antonio/.dlite/disk.sparseimage
shadow-path     : <none>
icon-path       : /System/Library/PrivateFrameworks/DiskImages.framework/Resources/CDiskImage.icns
image-type      : sparse disk image
system-image    : false
blockcount      : 41943040
blocksize       : 512
writeable       : TRUE
autodiskmount   : false
removable       : TRUE
image-encrypted : false
mounting user   : root
mounting mode   : <unknown>
process ID      : 491
/dev/disk2  FDisk_partition_scheme
/dev/disk2s1    Linux
/dev/disk2s2    Linux_Swap
/dev/disk2s3    Linux
nlf commented 8 years ago

@antoniocanas is it still there after a dlite rebuild -d 100?

antoniocanas commented 8 years ago

Yes

nlf commented 8 years ago

ah ha :) for some reason detaching the volume isn't working correctly, what do you get if you run hdiutil detach /dev/disk2?

antoniocanas commented 8 years ago
❯ hdiutil detach /dev/disk2
"disk2" unmounted.
hdiutil: couldn't eject "disk2" - Resource busy
nlf commented 8 years ago

and you're sure dlite has stopped? check ps aux | grep dlite

antoniocanas commented 8 years ago

This is strange...

❯ ./dlite stop
Stopping the agent: ERROR!
The agent is already stopped

~/tmp
❯ ps aux | grep dlite
root              483   3.1  0.3 575542528  28784   ??  S     6:43PM   3:07.08 /Users/antonio/tmp/dlite daemon
antonio          4699   0.0  0.0  2423976    292 s001  R+    8:16PM   0:00.00 grep --color=auto dlite
nlf commented 8 years ago

ahh, it didn't die for some reason.. you can do sudo kill 483 and wait a minute to see if it dies, if it doesn't do sudo kill -9 483 to kill it forcibly

antoniocanas commented 8 years ago

Right, now it's working, Thanks a lot

danquah commented 8 years ago

:clap: works perfectly, I can now boot my mariadb container. I'll test some more on my day-to-day setup and see if I can get into any more trouble :)

mikz commented 8 years ago

I have an issue with the dhyve-os image (v3.0.0-beta4).

Docker does not start at all, so when i ssh to the machine I can see it is not running. When i try to run the init.d it says ok, but does not start. After some digging I found out that docker binary has size 0:

$ ls -lA /usr/bin/docker
-rwxr-xr-x    1 root     root             0 Mar 15 18:59 /usr/bin/docker

I tried to fix that by downloading a docker release, but the internet does not work inside the machine.

$ ping google.com
ping: bad address 'google.com'
$ cat /etc/resolv.conf
nameserver 192.168.64.1
nameserver 192.168.64.1 # eth0

Where 192.168.64.1 is my machine. But I guess it does not run dns server?

dig google.com @192.168.64.1                              ruby-2.3.0

; <<>> DiG 9.8.3-P1 <<>> google.com @192.168.64.1
;; global options: +cmd
;; connection timed out; no servers could be reached

Fixed that by reinstalling dlite with --dns-server=8.8.8.8. But then dlite does not really start. On first attempt I can't ssh to it and on second it keeps starting the agent. And I could find in the logs:

virtio_net: Could not create vmnet interface, permission denied or no entitlement?

mikz commented 8 years ago

Looks like I forgot to reinstall it with -v 3.0.0-beta4. After doing that and using --dns-server=8.8.8.8 it works correctly!

nlf commented 8 years ago

@mikz good to know! i'm still not sure why dns doesn't work correctly out of the box for some people, that's why the --dns-server option exists :) i should probably document that better though

mikz commented 8 years ago

AFAIK OSX will start DNS server if you have Internet Sharing enabled in System Preferences > Sharing.

nlf commented 8 years ago

I don't have that enabled and dns works fine for me. I do know that some people have had dnsmasq running on their host system and that has caused issues. Still haven't been able to see if there are other situations that cause a failure there.

antoniocanas commented 8 years ago

I still have problems with PHP's Composer:

  - Installing symfony/console (v3.0.3)
    Downloading: 100%
    Failed to download symfony/console from dist: Could not delete /app/lumen/vendor/composer/254938c2:
    Now trying to download from source
  - Installing symfony/console (v3.0.3)
    Cloning 2ed5e2706ce92313d120b8fe50d1063bcfd12e04
The disk hosting /app/lumen/vendor is full, this may be the cause of the following exception
mikz commented 8 years ago

Thats it! I have dnsmasq running on ::1:53 and 127.0.0.1:53. I don't really want to run it on all the interfaces.

Guess some dns proxy running on that interface and random port would make sense. Like http://pow.cx/docs/dns_server.html. But initially dlite could check if there is dns server running and print a warning or something.

nlf commented 8 years ago

@antoniocanas can you verify with df -h from an ssh session to the vm that there is disk space? if there is, can you reproduce this in a small test scenario that you can send me so i can try to figure out what's going on?

nlf commented 8 years ago

@mikz excellent! i'll have to see what i can do. if i can detect a process listening on port 53 i can print a warning that should help other users not have to deal with this

antoniocanas commented 8 years ago

@nlf There is enough space. You could test it by just running: docker run --rm -v $(pwd):/app composer/composer require symfony/console

nlf commented 8 years ago

that doesn't work for me, i get an error when it tries to clone https://github.com/symfony/console.git

nlf commented 8 years ago

oh. nevermind, i see. that's the error you're getting too :) i just noticed the disk full error at the top

antoniocanas commented 8 years ago

Yep. It works if you remove the -v argument, so it's someway related to the volume

nlf commented 8 years ago

it also works if you run the container with bash as the entrypoint and clone the repos in your home directory, then copy them to the /app/vendor directory. it appears something about composer doesn't play nicely with 9pfs

nlf commented 8 years ago

ok, looks like i may not be correctly reporting space. i'm going to look into the 9p spec and see if i can figure out what i need to change to support this.

antoniocanas commented 8 years ago

@nlf It's weird that just some kind of php packages fails, also composer has a 'diagnose' argument that shows: Checking disk free space: OK

Maybe the clue is in this line of the error: Failed to download symfony/console from dist: Could not delete /app/lumen/vendor/composer/254938c2

nlf commented 8 years ago

it's all a side effect of the 9p server not reporting the free disk space correctly. i'm reading through protocol docs to try to find a fix

STRML commented 8 years ago

Anyone seeing their containers self-destruct on reboot/hard shutdown with 2.0.0? My Macbook has a "cold bug" (shuts down below an internal temp of roughly 20 degrees Celsius (yes, really)) and dlite appears to just rebuild when the system comes back up. Not a big deal since my containers are small, but could be a bigger pain in the future.

nlf commented 8 years ago

@STRML i have not seen that.. i do know that a hard shut down can cause corruption on the vm's disk (just like it can on your host's disk, ouch) but i've definitely never seen it just clear out everything

STRML commented 8 years ago

Yeah, daily kernel panics are not fun - out of warranty. I'll keep an eye on it, sounds like it's not something dlite could reasonably handle. Thanks.

nlf commented 8 years ago

just so everyone is aware, the free disk space issue seems to be inherent with 9p2000 protocol shares as there's no "statfs" message sent to the server, which is what would inform the guest of free disk space. there's a 9p2010 spec that adds this, but it looks like support in the kernel is lacking so i'm at a bit of an impasse.

my current plan is to evaluate embedding a usermode nfs server into dlite, with some changes to allow for better user mapping from container/vm to host (basically just using extended attributes like my 9p changes did). it may take a while and i'm not even sure if it'll work, but it's worth taking a look to see if it's even feasible.

antoniocanas commented 8 years ago

with official node:

npm ERR! Linux 4.4.3-dhyve
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "install" "--save" "hapi"
npm ERR! node v5.4.0
npm ERR! npm  v3.3.12
npm ERR! path /app/node_modules/.staging/hoek-4b4bf751ce5c2289f71352c6623520d9
npm ERR! code EXDEV
npm ERR! errno -18
npm ERR! syscall rename

npm ERR! EXDEV: cross-device link not permitted, rename '/app/node_modules/.staging/hoek-4b4bf751ce5c2289f71352c6623520d9' -> '/app/node_modules/hapi/node_modules/hoek'
nlf commented 8 years ago

yeah the 9p share has all kinds of issues unfortunately. it's gonna be going away before 2.0.0 is officially released

michaelshobbs commented 8 years ago

For those running 2.0.0-beta5 with the firewall on, you will need to allow connections to bootp in order for the vm to startup

danquah commented 8 years ago

About that - did the 2.0.0-beta5 tag on https://github.com/nlf/dhyve-os/releases disappear a while back? Just attempted to do a re-install and it failed hard when it could not find the release :/

nlf commented 8 years ago

if you mean 3.0.0-beta5, then yes. i removed it because it had a severe bug. 3.0.0-beta4 is what you should be using

STRML commented 8 years ago

What was the severe bug?

nlf commented 8 years ago

to be completely honest, i don't remember. i do recall that i pulled the release within an hour or so of pushing it, though

danquah commented 8 years ago

ah yes, 3.0.0-beta5 - no problem then :)

Been running on 3.0.0-beta4 for a while now, it's pretty stable. My main issue currently is filesystem performance. I'm mainly using Docker for Drupal development, and Drupal has a habit of doing a lot of file-stat operations. As they seem to be slow even running through 9p, the bulk of a the execution of a page-request ends up being php checking whether files exists.

Looking forward to seeing if you come up with something clever for (dlite) 2.0, and/or whether this mythical docker for Mac I'm signed up for (and almost nobody seems to have had access to?) has cracked the problem.

nlf commented 8 years ago

i'm working on implementing something similar to 9p but purpose built for dlite to ensure correct permissions. i have something that functions right now and am working on increasing performance. any examples you can provide of things that perform poorly with 9p would be appreciated, so i can test them against what i'm working on

danquah commented 8 years ago

Sure thing, I'll see if I can whip something quick (slow that is) up

cmosetick commented 8 years ago

@danquah I have some ideas for improving file system performance, which I think will help. I've been stuck in proof of concept mode though, as I've been really short on time the last month. No ETA right now, but I'll try to make some time.

danquah commented 8 years ago

I've done a simple test that has been packaged into an image that should be quite easy to run. The test just creates a lot of 1KB files runs a stat on them and reads them back in you should be able to run a test that uses a data-volume like this

docker run --rm -v ~/storage:/storage danquah/php-fileperf

You can find a simple script for running the test in the repo

The test took about twice as long to run under dlite2:

Dlite1:

Creating 10000 files 1KB each in /storage, total size 10,000KB 
Created 10000 files in 11.88 ms
Got status of 10000 files in 1.08 ms 
Read 10000 files in 2.49 ms 
** Total duration: 15.45 **

Dlite 2:

Creating 10000 files 1KB each in /storage, total size 10,000KB 
Created 10000 files in 16.21 ms
Got status of 10000 files in 4.25 ms 
Read 10000 files in 17.4 ms 
** Total duration: 37.86 **

And just as a baseline - if I let the file-creation happen inside the containers without using volumes we have no problem: Dlite 1

Creating 10000 files 1KB each in /storage, total size 10,000KB 
Created 10000 files in 0.5 ms
Got status of 10000 files in 0.04 ms 
Read 10000 files in 0.16 ms 
** Total duration: 0.7 **

Dlite 2

Creating 10000 files 1KB each in /storage, total size 10,000KB 
Created 10000 files in 0.76 ms
Got status of 10000 files in 0.04 ms 
Read 10000 files in 0.1 ms 
** Total duration: 0.9 **

I'll also try to do a test with a drupal-install - but in my first attempt I ran into permission-issues with dlite1 (drupal fails a deleting some files during install) - I'll be back :)

danquah commented 8 years ago

Got the Drupal install-test working. Runscript: https://raw.githubusercontent.com/danquah/docker-drupal-perf-test/master/run.sh

Results:

Dlite1, without volume

real    0m53.420s
user    0m4.200s
sys 0m1.330s

With volume

real    0m41.150s
user    0m5.220s
sys 0m33.560s

In other words - using a volume actually speeded up the install - weird.

Dlite 2 without volume

real    0m6.634s
user    0m3.900s
sys 0m0.660s

With a volume

real    0m16.701s
user    0m4.610s
sys 0m0.880s

So all of the sudden dlite2 is faster - even weirder when taking the previous test into consideration but obviously the installation is not hitting the same bottleneck as the basic file-creation test.

rhesus commented 8 years ago

I have followed the instructions in this issue and I am getting a different error when trying out master of dlite, when I ssh into the vm I see this:

DhyveOS version 3.0.0 -sh: /etc/profile.d/dhyve.sh: docker: Text file busy

Not sure what to do about it, the docker command is completely unresponsive and the logs don't really show anything interesting. 2016/04/11 14:37:50 http: response.WriteHeader on hijacked connection operation not supported by device rdmsr to register 0x34 on vcpu 1

nlf commented 8 years ago

that message is because docker hasn't finished downloading in the vm. give it a minute and check again

rhesus commented 8 years ago

Yes that was it, I wasn't patient enough. Thanks for all of your hard work on dlite!!!

nlf commented 8 years ago

this thread is getting suuuuuuper long so i'm going to close it. let's just open new issues to discuss bugs, and i'll open new issues when i add new features as well