Creating: Updating Software supplied by Base Image

fatherlinux commented 8 years ago

I disagree with this quite strongly. I don't think this is a general best practice. I also think this section needs to be broken into two sections:

Create base layers 1
Creating Layered Images

There are very different strategies for each:

Base Images: operations teams probably create base images and they absolutely want to update base images before they publish them. There is no other way to do it. Furthermore, the ops team may start with rhel7 and create rhel7-ourcorebuild. In this scenario, they will be creating a layered image, but may want to squash it. Either way, they want to do a yum update and should be recommended to do so from a security perspective.
Layered Images; I also disagree that you shouldn't do a yum update as part of a Dockerfile which builds a layered image. It's EVERYWHERE for a reason. Upstream vendors, partners, friends, family and anyone else that makes images suck!!! It's like letting family borrow money, it always puts you in a bad place. End users absolutely need to be able to do yum (and recommended that this is OK) when their upstreams burn them (which they will). There will always be some excuse why the upstream hasn't updated an image (build system broken, can't automate for some reason, licensing, who knows), but the downstream user should never be blocked by this. If it breaks that is a bug. If the upstream does something that is fragile, that is an anti-pattern. If you build an image with the assumption that others will consume it, you better be able to do a yum update....

baude commented 8 years ago

can you provide a link to the content you are objecting to?

fatherlinux commented 8 years ago

Yeah, sorry about that, I realized after I sent that I should have included a link:

https://github.com/projectatomic/container-best-practices/blob/master/creating/creating_index.adoc

eliskasl commented 8 years ago

Hey,

I'm just writing about the base images so I'll reorganize the whole chapter a bit.

About using yum update, this part was written about a year ago, and there have been many discussions since that so it definitely needs to be rewritten. I'll try to get back to it soon in case nobody else beats me to it.

baude commented 8 years ago

@fatherlinux We have socialized and considered your input. Our advice to those who develop images (like ISVs or developers) is still that they should not update a provided base image to obtain new packages. It remains on the provider to keep those images fresh and updated particularly when it comes to security updates. And again, remember this is a best practice and not a hard lined rule.

That said, we do feel that if users want to update images provided to them, it should be done on their own accord and should be done by not altering the image itself (nor a running container) but rather as part of a dockerfile where they create their own image.

Given the audience for this document is for developers, I think our message still holds. But if you like, we could clarify in the appendix with a revision or summary of the blog post I shared.

scollier commented 8 years ago

+1 @baude

fatherlinux commented 8 years ago

Perhaps, I misunderstand the use case? How does being developer make a difference? Could you deacribe the persona and use case this guide is for? Are you saying that a "developer" would pull a base image from a core build (rehl7-ourcorebuild) the ops team has already doe the yum update on? I "might" agree in that scenario, but whenever cargo crosses a frontier (countries in real life, vendor to customer in software) the downstream is responsible. So to state that another way, I might agree within a single organization, but I can NOT agred when organization boundries are crossed...

I have literally had this conversation with hundreds of customers, I would be more than happy to arrange a call with a couple of big ops teams if you need evidence of what customers want, not we think is philosophically right?

I very, very rarely could a yum update break a downstream build. If it does, the "developer has time to fix it (or bypass it, worst case) while doing the docker build. This d the whole magic of Docker over puppet. You do the loading of the container at the factory instead of the Dock.

This is something we all need to come tto agreement on because I am publishimg tons of articles around supply chain and they flatly disagree with this.

IMHO opinion, never block someone downstream. Almost every Centos Dockerfile out there starts with a "yum update", this would buck that trend and I disagree pretty strongly...

fatherlinux commented 8 years ago

I need better understanding of what yoy guys mean by:

Developer
What she is doing?
What you consider a base image....

baude commented 8 years ago

Sure, in this document, a developer is the person who is writing a dockerfile (for their application) who inherits a base image. A base image, in general terms, is the minimal container operating system of a distribution like docker.io/centos.

On your last paragraph there, this is a best practice -- a recommendation of sorts. There is no switch to stop it. I think reasoning documented are justified but can be overriden as they wish.

fatherlinux commented 8 years ago

So, there is talk of doing funky stuff with a layers to remove yum, which would make it mandatory, which is why it is so critical to get everybody on the same page.

mfojtik commented 8 years ago

@fatherlinux doing yum update in Dockerfile comes with consequences:

1) Each image you have in registry will have different base layer based on the date/time the yum update was called. That results into more more storage usage in docker registry. 2) Updates might create inconsistency between images as you are loosing track of what version is installed where (assuming you're not rebuilding all images at once). 3) Having "yum update" basically tells that the supplier of your base image sucks and fail to update it regularly for you. Doing so yourself in upper layer is just workaround for this issue and the real fix should be "trust your image supplier". 4) Having said that, the image supplier might spend some effort on testing the base image to provide best experience. That is not always correlated with having "latest" versions installed as they might not be well tested.

fatherlinux commented 8 years ago

@mfojtik I will address each inline. Also tagging in @rhatdan :

1) Each image you have in registry will have different base layer based on the date/time the yum update was called. That results into more more storage usage in docker registry.

I think I know what you are trying to say, but these are not called "base layers" 1. I would refer to them as intermediate layers, but I think I understand what you are saying. I believe you are trying to say there is a Turing complete problem at the intermediary layer, and your logic would be correct. There are still a couple of problems.

This turning complete problem exists for any RUN command, not just yum updates - most critically it exists for "yum install" lines which could add a lot more data to an intermediary layer. Worse yet, is the fact that a yum install could cause any number of dependencies to be updated, which creates another Turing complete problem of an unknown/untested permutation of packages in the intermediary layer.

The safest method is to do a yum update to the latest greatest provided by your upstream RPM repository (which should be a tested set of packages in a Satellite channel snapshot or Content View) 2. While doing a yum update will create a new intermediary layer, it CAN be known if you have good package hygiene and make sure you always use a Satellite Content View or Channel Snapshot.

Then the operations team will have good control at that first intermediary layer. Honestly, if they are doing their job correctly, a developer's "yum update" should have no effect, so the recommendation becomes useless. It's a win/win and again, I think a useless recommendation.

2) Updates might create inconsistency between images as you are loosing track of what version is installed where (assuming you're not rebuilding all images at once).

Incorrect, this problem should and would always be mitigated by repositories/satellite channel snapshots/content views. In fact a 'yum -y update" is the ONLY way to know what is in the intermediary layer. Again, doing arbitrary updates of only certain packages (which is what this guide recommends" produces a Turing complete package of an arbitrary but unknown set of permutations at the intermediary layer.

Stated for clarity, this guide recommends that a developer only update certain packages which would cause this problem, not mitigate it. Real problem, not solved by this recommendation, only solved with Satellite Content Views/Channel Snapshots.

3) Having "yum update" basically tells that the supplier of your base image sucks and fail to update it regularly for you. Doing so yourself in upper layer is just workaround for this issue and the real fix should be "trust your image supplier".

I will not argue this point, it's philosophy, not science. I am going to write an article called Why Michael Crosby is wrong 3.

Trusting a supplier is a myth. There was never trust with ISOs, and we are years away from some sort of manifest standard that would let me trust a docker base layer. Almost every operations team I have talked to is and will continue to do "yum updates". Sorry, this is science": "what are people doing?" vs. philosophy: "what people aught to do."

Again, ask people what they are doing now, don't tell them. You will gain a lot more wisdom.

4) Having said that, the image supplier might spend some effort on testing the base image to provide best experience. That is not always correlated with having "latest" versions installed as they might not be well tested.

Yeah, we (Red Hat) have this problem. It's why we delayed Docker 1.9. This also proves my point. What if the upstream supplier makes a decision that I don't like? What if they delay the release of a patch because it breaks some of "their" software, but not mine? Then I am screwed and can't get an update. The only way around this is to have freedom at each layer of the supply chain. Organizations MUST be allowed to do their own yum updates.

I feel like nobody arguing this really has a security background, you cannot limit organizations from updating RPM content, the content is getting produced for a reason. Docker is a shipping container, we still use barrels, boxes, bags, and crates inside the shipping container because that is typically what we are good at loading at the factory. The same is true with RPMs (for now)...

fatherlinux commented 8 years ago

Also, I want to make the simple observation that having more permutations on disk and hence using more disk space is not as big of a business problem as having an exploit in one of my applications. So, a yum update will always trump the permutations argument....

dav1x commented 8 years ago

I have a fair amount of experience with operations to scale, security scans (qualys and the like) and multiple business sectors requiring vastly different images/builds.

Generally speaking, the security guys were the ones most concerned with packages being upgraded every time a release was pushed. The problem with this was applications relied on specific versions and builds and we couldn't update them every time a new package was released. There were always situations where CVE always trump the norms.

That being said, IMO it makes more sense to specify the exact build you need in the FROM line of the Dockerfile so that the consumer has precise control as to the package versions and application requirements. There will always be use cases where a "yum update -y" is required. But, I believe more customers and consumers would be more interested in having that precise control.

fatherlinux commented 8 years ago

Interedting perspective. I don't vompletely disagree. I like the clean choise in the FROM line.

That said, again, the ops team in a business should be controlling the package sets, not the devs. Ops should have a strategy for not breaking builds (which maybe is the control point to devs. e.g. tagging builds), like:

Capturing when things break and building tests as part of the resolution.
Using something like RHEL that provides good stability (api/abi)
Good support lifecycle, etc.

That said, things will break in the developer world if they run a yum update, but they will also break when the vendor/ops team does a base image update, who runs the yum udate is irrelevant. As a developer I would rather:

Make that choise myself
Have my ops team make that choise and test (Satellite, Dockerfiles with yum updates, and tests)
Not have the upstream image vendor force me into the latest update without understanding what it might break.

If there is an ops team managing the builds internally, I think I can live with a no developer yum updates being a recommendation. I think this hits th 80/20 rule.

Restated simply:

ops = yum update
devs = FROM

This does have some downsides. Ops/ security is not going to be happy if devs refuse to pick up a new base image/tag that has critical security updates in it....

langdon commented 8 years ago

To your closing point, this is the crux of the interest by devs in containers. "SAs/Ops can't break my application by applying an unapproved update." Containers let me, as a developer, actually test the changes that an updated library/rpm make to my application. And, particularly with advanced container tooling, they allow the developer to re-build, test, and ship the new, tested app to the ops folks for deployment (or directly, depending on the infra) in a reasonable (per ops) timeframe.

Fundamentally, this conversation is about why developers want to vendor-ize all the things they depend on. Developers don't care (usually) about the OS firewall getting a patch, or ssh getting a patch, but, stay away from patching anything in the developer's direct stack without testing the patch.

fatherlinux commented 8 years ago

http://rhelblog.redhat.com/2016/02/24/container-tidbits-can-good-supply-chain-hygiene-mitigate-base-image-sizes/

projectatomic / container-best-practices

Creating: Updating Software supplied by Base Image #68