Open original-brownbear opened 8 years ago
@yegor256 would you be on board with setting up a private Registry for us? I think it's an important step towards efficient dynamically provisioned Build runners and also would help us handle private images easier.
@original-brownbear :+1: sounds good!
@alex-palevsky this is a bug.
@alex-palevsky this is postponed.
@alex-palevsky this is a bug.
@original-brownbear I added bug
tag to this ticket
@original-brownbear since there is no milestone yet I set it to "2.0"
@original-brownbear thanks for this report, I added 30 mins to your account, in transaction AP-0NX64527LY920661T
@alex-palevsky this is postponed.
@original-brownbear right, I added "postponed" label
@alex-palevsky this is postponed.
@original-brownbear someone else will help in this task, no problem at all
@original-brownbear hm... I'm not entirely sure I understand the concept here. my key questions: 1) why a new EC2 instance, why can't we use Docker Hub paid account? 2) why do we need private Docker images?
@yegor256
1) why a new EC2 instance, why can't we use Docker Hub paid account?
Because if we use images, that are not build during the merge, then any change to the dependencies needs to go in two steps:
Dockerfile
to master to trigger DockerHub update (while still building the old code that does not need the new dependencies)Dockerfile
update... this is a very risky process as we've seen in Rultor a few times (updating dependency versions, random Gemfile issues etc.), because you basically have to trust that the first step will work out. Right now I could add
exit 1;
to the Dockerfile
in Rultor ( as part of CMD
or ENTRYPOINT
) and break the build for good until manual action is taken on master. Rultor cannot recover this situation.
If on the other hand Rultor builds the Image and pushes it to a private registry we're good in that regard. We can never merge a broken Dockerfile
then, if the building and pushing is part of the merge process.
I understand this would be possible using Dockerhub too, but see last point ...
Why do we need private Docker images?
We simply have commercial projects that are now getting their own Dockerfile
s. I don't think we want to expose those publicly.
Why Private Registry >> DockerHub
Makes sense ?
@original-brownbear we can create AWS image, which will be used to create EC2 instances. That image will have a docker image pre-fetched. What about this?
@yegor256 well this only solves/alleviates the issue for the Rultor image users. Also it requires us to use EC2 instead of ECS.
I think the clear downsides are still these:
=> I think my plan is far superior in the outcome + easier/safer to implement too since it can be accurately tested.
@original-brownbear @yegor256 I'm new to this type of question so may say something obvious but I have to say this. I took another look at this issue and want to rise one more question - security. If I understand correctly, adding possibility for repo owner (user) to specify the image to be used by Rultor to build the project implies that any user can execute almost any code in Rultor's docker environment. At least, this is a perfect way to DDoS the Rultor (Yes, I know about CPU, RAM, NET restrictions per) container. But at most, code executed in one container could affect other containers (sorry, can't find some proof link, but I've heard a lot on noise about container security, and none said it's safe to run arbitrary images on own platform). We should consider this issue carefully before doing something with it.
@longtimeago Yea DDoS may be an issue, but really the same goes for Travis to a much larger extend. I think Rultor simply wouldn't be worth it (or even capable to really do any damage). I mean we wouldn't setup ECS to spawn off an unlimited number of builds :) (nor would Amazon give us that freedom in the first place).
About security between containers, past me took care of the situation a while back here #1008 :) This really is an issue of the past when it comes to Docker, so long as we don't give the container any privileges ( which we don't anymore :) ), we're good with that (excluding the possibility that someone knows some non-public exploit :P ).
@original-brownbear what about the possibility of building image and running container which starts email bots, bit-miners ... ?
UPD: Say it's possible right now :)
@longtimeago well, in theory that's obviously possible :) But again ... You can in fact start a Docker instance in any Travis build too for example! I think an attacker would much rather do this, than use Rultor ? :) ... obviously not an argument :)
But I think Travis as well as Rultor are safeguarded by very limited resources and also GitHub here. The maximum runtime Rultor allows is 120minutes, and you only get one build/deploy per Github Repo simultaneously, so in order to exploit us here you'd have to do this for 120min of evil:
Then you also have to factor in, that we could simply set an alarm at a certain number of containers in Amazon and/or hard limit the maximum number of the at say 4 or 8 or so. Any attacker really wouldn't be getting much out of this.
=> I really don't see what you could even accomplish with a malicious image. => Plus should this against all expectations turn out to be an issue, we could simply adjust the network settings and set a limit of like 50MB on the outbound traffic per build right ? :)
@original-brownbear Nice explanation, thanks! Actually, you've mentioned at least 2 preventive tasks to be done ;)
@original-brownbear shame on me, I haven't heard about EC2 Container Service (ECS) before. As I understand now (http://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html), it is just an EC2 instance with a Docker daemon installed there. All AWS is giving us is an ability to manage that containers via AWS API. I don't see any advantage of that, comparing to our own EC2 instance, which we have now and a plain simple SSH (which we do now). The only advantage AWS ECS is giving us is an ability to managed many EC2 instances through one entry point. Am I wrong?
@yegor256
The only advantage AWS ECS is giving us is an ability to managed many EC2 instances through one entry point. Am I wrong?
Wrongish :), the thing isn't so much that ECS gives us the ability to manage multiple EC2 instances ( though this is nice too of cause ), but that it gives us a well designed scheduler for running containers. Instead of our current very naive and error prone ( look at some builds stability ) use of simply scheduling by CPU load on the EC2 instance, we'd get proper resource management out of the box. => waiting for CPU and RAM to become available, running task, queuing other tasks meanwhile and guaranteeing certain resources. (currently we simply guarantee a bunch of RAM via swapping and it's less than ideal, making something better ourselves though would be very tricky and hence expensive).
Please understand here too, that running the EC2 instances fully on-demand ( one instance per build) would be very very slow and also very expensive. Plus like all solutions revolving around just dynamic EC2 it requires maintaining an AMI.
If we keep an instance running and just reboot it when in trouble we gain nothing in terms of the build stability issues we have in some projects at the moment during peak hours. Also a bad solution.
So if we actually want to improve the stability issues, both from the EC2 outright dying as well as from load spikes we need scheduling. ECS simply gives us just that out of the box. You're given a flexible decision on how much money to spend on EC2 and unlike now the decision will only affect build time, not stability. Making something that keeps a dynamic or even static number of EC2 instances available to Rultor and then implementing the whole process of distributing builds among those instances would just be reinventing ECS.
Also in terms of hands on implementing this, I'm convinced ECS is our fastest route to proper on-demand provisioning: We don't need to change anything to the current SSH implementation (bad as it may be, it works for now). We can simply use ECS to dynamically provide us with Docker in Docker containers giving us the same environment we had before but with guaranteed resources per build (and allow this environment to be fully under Rultor's control, no randomness from some AMI). It simply decouples the whole EC2 cost and maintenance side from the side of simply running Rultor builds. We can set the ECS settings to whatever you see fit, Rultor will still always work and use the ECS api to schedule its builds.
Makes sense ?
@original-brownbear yes, it does make sense, thanks. OK, I'm in, let's use ECS
Problem
We are currently trying to move to dynamic provisioning of Docker daemons for Rultor. This means that Rultor builds would stop sharing the same build cache, making builds potentially very slow, involving the download of the 1.5 GB Rultor image in many cases at present. Also we are starting to have the need for a private way of storing Docker images in some of our projects.
Solution
Provision an ECS instance with a private Docker registry backed by S3 and use it with Rultor. It should act as a passive cache/proxy in front of DockerHub as well as by used actively by the Rultor build
Concrete Implementation Required by this Issue
Registry
Rultor Build Runner
.rultor.yml
=> this gives us a very smart cache, while allowing to still delete build form a directory right after the Rultor run and to save disk space.