smebberson / docker-alpine

Docker containers running Alpine Linux and s6 for process management. Solid, reliable containers.
MIT License
596 stars 186 forks source link

fix #11 #13

Closed matthewvalimaki closed 8 years ago

matthewvalimaki commented 8 years ago

Signed-off-by: Matthew Valimaki matthew.valimaki@gmail.com

matthewvalimaki commented 8 years ago

I've updated user-consul-nginx-nodejs to show how consul-template is used as it was the only one where we actually have two services in one container.

matthewvalimaki commented 8 years ago

Forgot to mention:

smebberson commented 8 years ago

@matthewvalimaki, to make this super complete I think it might need a consul watch implementation, https://www.consul.io/docs/commands/watch.html. Having the benefit that configuration could be re-run when the nodejs.app IP changes? What do you think? Are you interested in adding that?

smebberson commented 8 years ago

@matthewvalimaki, btw, thanks for this. It's going to be a great addition. I'm happy to help add the extra bits and pieces if you like.

matthewvalimaki commented 8 years ago

@smebberson No need for consul watch as consul-template already handles this. So if IP, port changes or new nodes with same service and tag become available /etc/nginx/conf.d/default.conf will be automatically rewritten and nginx through s6 restarted.

And no problem, I like these Docker images and use them myself and want to improve them so that I have less to worry about downstream :)

matthewvalimaki commented 8 years ago

@smebberson Improvements pushed, see https://github.com/matthewvalimaki/docker-alpine/commit/ae7469196628501ad295e1ed14ae3ddd80328744.

smebberson commented 8 years ago

@matthewvalimaki, these changes look great. Given the scenario we're faced with, the loop you have implemented should work fine. To implement notifications, well, I think it will require some effort from Consul's part. But I think we should raise it as a bug. Either consul-template should not throw an error but repeat connection attempts for a specified amount of time, or consul should repeat when it is not only up, but ready to start receiving requests.

I wasn't aware that consul-template already does the job of consul watch also. That is great.

Thanks again. I'll look to merge this in ASAP.

matthewvalimaki commented 8 years ago

@smebberson actually I believe by using consul watch, something you brought up, and its Type: nodes we probably could remove the loop I introduced and clean up consul-template run. Consul Watches documentation for Type: nodes states the following:

The "nodes" watch type is used to watch the list of available nodes. It has no parameters.

Then in combination with consul handler executing a shell script we should be able to know when consul agent is in usable state.

alpine-consul and alpine-consul-base would see the following new file introduced in /etc/consul.d/_watch-node.conf and /etc/consul.d/bootstrap/_watch-node.conf. Actually the file should be in /etc/consul.d/watches/_watch-node.conf and then symlinked to mentioned paths:

{
    "watches": [
      {
          "type": "nodes",
          "handler": "/usr/sbin/consul-nodes-handler.sh"
      }
    ]
}

consul-nodes-handler.sh would then be responsible for: 1) Checks if templates exist. 2) Remove /etc/services.d/consul-template/down if templates did exist. 3) Executes s6-svc -u /var/run/s6/services.d/consul-template to start consul-template. 4) rm symlink 5) consul reload to reload configuration with now removed watch.

Unfortunately as it is https://github.com/smebberson/docker-alpine/blob/master/alpine-consul-base/root/etc/services.d/consul/run is recursively set to go through /etc/consul.d/ so it would load up /etc/consul.d/watches every time. @smebberson what if we rename /etc/consul.d to /etc/consul and then have all default configurations in /etc/consul/conf.d? That way we could have /etc/consul/watches and do the symlink I mentioned above.

/etc/cont-finish.d/01-consul-template would re-create the symlink and down removed by handler.

smebberson commented 8 years ago

@matthewvalimaki, yep, I like the sound of that. I think that would be a great addition and tidy things up nicely! Great suggestion.

matthewvalimaki commented 8 years ago

@smebberson I've updated code to reflect what was discussed earlier with one exception: Instead of /etc/cont-finish.d/01-consul-template I opted to use /etc/cont-init.d/00-consul-template to guarantee that necessary down and symlink are created in case services are not finished properly.

smebberson commented 8 years ago

@matthewvalimaki, sorry, I didn't realise this was ready for review again. I assume it is?

matthewvalimaki commented 8 years ago

No problem and yes it is ready again. A side note: would it be easy to implement tests for these? Haven't looked into that yet at all but I would like to work on those. On Dec 12, 2015 1:06 PM, "Scott Mebberson" notifications@github.com wrote:

@matthewvalimaki https://github.com/matthewvalimaki, sorry, I didn't realise this was ready for review again. I assume it is?

— Reply to this email directly or view it on GitHub https://github.com/smebberson/docker-alpine/pull/13#issuecomment-164191735 .

matthewvalimaki commented 8 years ago

@smebberson apologies for delayed response. My only reasoning for having consul-template in alpine-consul was the fact that consul itself was available there even though it it is in server mode. I do not have a use case right now but something that I've been looking at is https://github.com/hashicorp/consul-replicate, which perhaps should be part of the alpine-base. If consul-replicate is introduced then potentially consul-template might have a use case :)

I do not have a strong opinion where consul-template should be, so if you want I can move it to alpine-consul-base it's not a big deal really.

smebberson commented 8 years ago

@matthewvalimaki, I think it would be great if consul-template lived in alpine-consul-base at this stage. I'd like to keep alpine-consul as lightweight and simple as possible. Are you able to move it there? The only reason this hasn't been merged into master yet (and then onto your other PRs) is because I've been struggling to find the time to do it myself.

matthewvalimaki commented 8 years ago

@smebberson no problem, I've now updated the PR accordingly. Also updated consul-template to latest release.

Note that this PR introduces probably noteworthy change: root/etc/consul.d was renamed to root/etc/consul. consul configurations are now expected to be in root/etc/consul/conf.d/. This was done to allow introduction of consul watch via json config.

matthewvalimaki commented 8 years ago

@smebberson I've noticed issues with the sh scripts, trying to track down the issue so do not merge or try to fix it on your own. Noticed these issues only after trying to implement consul-template with Nomad.

matthewvalimaki commented 8 years ago

So one issue is that the script that consul watch is executing wants to s6-svc -u /var/run/s6/services/consul-template however as consul is running as user consul it has no permissions to command services, see permission denied Problem seems to be that /var/run/s6/services/consul-template is owned root:root and doing chmod -R 777 /var/run/s6/services/consul-template does seem to solve the issue.

Having consul watches control consul-template service like this I think should be ok. So perhaps we just change /var/run/s6/services/consul-template permissions?

@smebberson your thoughts on this?

matthewvalimaki commented 8 years ago

In alpine-consul-base we could modify /etc/services.d/consul/run to include chown -R consul:consul /var/run/s6/services/consul-template. I tested this and it would seem to work.

reidab commented 8 years ago

Thanks for putting this together! I was just looking at doing something similar with these images :grinning:

I've been looking into s6 a bit and it looks like everything can be left read-only and owned by root, except for the supervise subdirectory, which needs to be writable by the user that's running s6-svc.

matthewvalimaki commented 8 years ago

@reidab Thanks for the comment, I didn't even think about the subdirs.

So perhaps this is how the PR should be modified:

smebberson commented 8 years ago

@matthewvalimaki, I agree with your comments. We should create a group that has necessary permissions for writing files in the necessary spots.

The place to alter directory permissions is in the fix-attrs.d directory. Although, this only allows changing owner. With that, it might be good to create a file in cont-init.d to change permissions as required. I think this is cleaner than running it in run scripts.

I do wonder what is the best practice here however, as far as s6 is concerned. I might ask the s6-overlay guys and see what I get back...

matthewvalimaki commented 8 years ago

@smebberson I'll do those group changes.

@smebberson according to init-stages stage 2 i. /etc/fix-attrs.d happens before stage 2. iii which is where copying happens. I have not tested this but I would think the problematic dir in question /var/run/s6/services/*/supervise would still be root:root.

@smebberson to me it feels like s6-overlay should have 4th step in stage 2 after copying files.

matthewvalimaki commented 8 years ago

@smebberson consul-template user and group added. I've updated user-consul-nginx-nodejs example to make use of this change, see the Dockerfile.

Running consul-template as non-root does cause another problem: reloading of services. Having consul-template belong to the same group as nginx so that consul-template can rewrite configuration works. But in order for that configuration to be loaded up one would need to tell s6 to reload nginx leading us back to the problem of /var/run/s6/services/*/supervise belonging to root:root.

Right now this problem is solved for consul-template in /etc/services.d/consul/run where I made it chown -R consul:consul /var/run/s6/services/consul-template. Unless s6-overlay adds something to manage this I think the best place is indeed in services.d/run.

To make things cleaner we could introduce a new group which owns, along with root, all /var/run/s6/services/*/supervise so that users such as consul-template by belonging to that group could easily manage services. This way in all run scripts where we use s6-setuidgid we should also modify the /var/run/s6/services/*/supervise.

smebberson commented 8 years ago

@matthewvalimaki, have you seen the response in https://github.com/just-containers/s6-overlay/issues/130 ? This goes along with our thinking. Let's set it up as you've stated:

Introduce a new group which owns, along with root, all /var/run/s6/services/*/supervise so that users such as consul-template by belonging to that group could easily manage services. This way in all run scripts where we use s6-setuidgid we should also modify the /var/run/s6/services/*/supervise.

I would like to some how keep this out of the individual run scripts if possible, do you agree? I agree with your sentiment about a 4th stage. I'll follow up in the same issue on s6-overlay and see what happens.

smebberson commented 8 years ago

@matthewvalimaki, it seems as though we should be able to set everything up in some cont-init.d scripts however? Reading http://skarnet.org/software/s6-portable-utils/s6-hiercopy.html, it says owner and group are preserved. Although, it feels like this setup stuff should be in alpine-base?

smebberson commented 8 years ago

@matthewvalimaki, are you still keen on pushing this one through?

matthewvalimaki commented 8 years ago

Yes. I am currently on vacationing causing the delay. On Jan 27, 2016 12:49 AM, "Scott Mebberson" notifications@github.com wrote:

@matthewvalimaki https://github.com/matthewvalimaki, are you still keen on pushing this one through?

— Reply to this email directly or view it on GitHub https://github.com/smebberson/docker-alpine/pull/13#issuecomment-175276684 .

matthewvalimaki commented 8 years ago

@smebberson I tested the idea of setting permissions on /etc/services.d/consul to root:s6 (s6 being the new group) and s6-hiercopy does seem to retain ownership. Unfortunately s6-svc -h /var/run/s6/services/consul/ threw permission denied and code 111. Once I changed ownership to consul:s6 the service restart worked. Problem seems to be that s6 requires user to be defined (in this case consul) and group is irrelevant. Adding group write permission seems to do the trick. I followed up on your comment at https://github.com/just-containers/s6-overlay/issues/130#issuecomment-181554343.

Unfortunately /var/run/s6/services/consul/supervise/ is filled after cont-init.d therefore any permission fixes we might want to do would have to be done from run.

smebberson commented 8 years ago

Okay, cool. No worries, thanks for following that up. So, is everything ready in this PR to be merged?

smebberson commented 8 years ago

@matthewvalimaki, ping! Let me know if this is ready to go. I want to get this merged and then resolve the other issues on this repo. Thanks!

matthewvalimaki commented 8 years ago

@smebberson pong! Sorry for the delay. This PR has existed far too long. Not only has it held back new releases but it's hard to capture all that has been discussed even when those are written above.

Please review if Dockerfile could be improved, the first RUN seems a bit heavy to me.

All changes were made to alpine-consul-base:

smebberson commented 8 years ago

@matthewvalimaki, this looks good. I'm going to pull this down, work it up a little more and then merge it into master. We can always improve on things along the way. My gut feel is this is a little heavy straight in alpine-consul-base but I'm more than happy with the overall capability this brings and I'm happy to see how it goes.

I'll get onto this first thing tomorrow morning (don't want to start this so late in the day here).

matthewvalimaki commented 8 years ago

@smebberson sounds good. Please do whatever changes you want. I just want consul-template to be available :) Once we have this merged I will provide PR for nomad.

On Tue, Mar 1, 2016 at 1:29 AM, Scott Mebberson notifications@github.com wrote:

@matthewvalimaki https://github.com/matthewvalimaki, this looks good. I'm going to pull this down, work it up a little more and then merge it into master. We can always improve on things along the way. My gut feel is this is a little heavy straight in alpine-consul-base but I'm more than happy with the overall capability this brings and I'm happy to see how it goes.

I'll get onto this first thing tomorrow morning (don't want to start this so late in the day here).

— Reply to this email directly or view it on GitHub https://github.com/smebberson/docker-alpine/pull/13#issuecomment-190631945 .

matthewvalimaki commented 8 years ago

@smebberson understanding that this + other issues cause quite a backlog but I was wondering how is it going? Do you need help with anything? I rather help you with this project than start maintaining my fork so let me know if there's anything I could do. I have quite a few things depending on these images :)

smebberson commented 8 years ago

@matthewvalimaki. Sorry! Working on it now ;)

smebberson commented 8 years ago

@matthewvalimaki, I'm getting some weird errors and I can't get it to run. I guess you weren't experience anything like the below?

Consul Template returned errors:
open /etc/nginx/conf.d/186949828: permission denied

I don't have heaps of experience with consul-template so I'm not sure what it would be doing. Or even why it would be looking in that directory?

Any ideas?

smebberson commented 8 years ago

@matthewvalimaki, I managed to get past that. Running into a few other issues now. But I'm working through them.

matthewvalimaki commented 8 years ago

Sorry, sounds like I made a broken or :(

A side note: remember that we can use pre-built consul binary as of 0.6. On Mar 17, 2016 6:35 PM, "Scott Mebberson" notifications@github.com wrote:

@matthewvalimaki https://github.com/matthewvalimaki, I managed to get past that. Running into a few other issues now. But I'm working through them.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/smebberson/docker-alpine/pull/13#issuecomment-198154647

smebberson commented 8 years ago

@matthewvalimaki, take a look at https://github.com/smebberson/docker-alpine/tree/matthewvalimaki-add-consul-template. That is my integration branch. Everything seems to be running smoothly now. There was a number of issues due to permissions not allowing certain things to occur (such as removing the consul nodes watch symlink - you didn't experience anything like that?).

The only thing I'm not happy with is the weird permissions I had to have on the /etc/nginx directory. I had to set them to 674 (??) so that consul-template could write to the directory. When I run with 664 I get:

Consul Template returned errors:
open /etc/nginx/conf.d/default.conf: permission denied

Despite the fact that consul-template has been added to the nginx group? Can you see why that would happen?

If you want to run this locally, you'll have to build (in this order):

If you could give me your input on that - that'd be great. I'll wait to merge into master until you get back to me.

Also, before I merge into master, I'll release v1.0.0 of alpine-consul-nginx, and alpine-consul-nginx-nodejs and then merge those into my integration branch so that people wanting to keep the old versions can.

matthewvalimaki commented 8 years ago

@smebberson please merge #12 while you're at working on nodejs :) I'll think about your questions now and see if I can come up with something. Thanks for the cleanup.

matthewvalimaki commented 8 years ago

@smebberson going back to your last questions:

There was a number of issues due to permissions not allowing certain things to occur (such as removing the consul nodes watch symlink - you didn't experience anything like that?).

I did! And I thought I pushed up those fixes, very similar to what you did. I'm sorry, I should've made a clean clone to test it again. The symlink was one of the very first permissions issues I encountered.

Do you still have similar issues elsewhere?

The only thing I'm not happy with is the weird permissions I had to have on the /etc/nginx directory. I had to set them to 674 (??) so that consul-template could write to the directory.

I did not see that. Probably having Windows as host causes some issues for permissions for me. As the file you mentioned, /etc/nginx/conf.d/default.conf, is added when building I suspect the ownership or permissions being incorrect.

smebberson commented 8 years ago

@matthewvalimaki this is finally done! I've version bumped and Git tagged, and Docker hub tagged alpine-consul-base, alpine-consul-nginx and alpine-consul-nginx-nodejs. All seem to be working well. Thanks for your PR!

matthewvalimaki commented 8 years ago

@smebberson awesome! Good job! Now I get to provide Nomad and other improvements. These images are by far the best with Hashi stuff :) On Mar 22, 2016 4:41 PM, "Scott Mebberson" notifications@github.com wrote:

@matthewvalimaki https://github.com/matthewvalimaki this is finally done! I've version bumped and Git tagged, and Docker hub tagged alpine-consul-base, alpine-consul-nginx and alpine-consul-nginx-nodejs. All seem to be working well. Thanks for your PR!

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/smebberson/docker-alpine/pull/13#issuecomment-200081079