openshift / cluster-samples-operator

The samples operator installs+maintains the sample templates+imagestreams on a cluster
29 stars 87 forks source link

ImageStreamTags are not updated #356

Open olaf-meyer opened 3 years ago

olaf-meyer commented 3 years ago

Hello,

I have a question regarding the cluster-sample-operator: I understand the cluster-sample-operator is maintaining the ImageStreams and Templates in the project openshift. I had a look at an older test OpenShift cluster and noticed that the ImageStreamTag were created but are never been updated since. From my point of view, the ImageStreamTags are created when a tag attribute is "added" in an ImageStream. If a tag is removed from an ImageStream, the Adminstration Consol will display ImageStreamTag in the ImageStream with the text "pushed image" in the from column. From my point of view, ImageStreamTags are containing no reference to ImageStreams. Any modification (besides the creation) of tags in the ImageStreams seem to be not reflected in ImageStreamTags.

Is this on purpose that ImageStreamTags are not updated (replaced with a new version), if the associated ImageStream is modified?

Thank you very much in advance,

Olaf

gabemontero commented 3 years ago

Hello,

I have a question regarding the cluster-sample-operator: I understand the cluster-sample-operator is maintaining the ImageStreams and Templates in the project openshift. I had a look at an older test OpenShift cluster and noticed that the ImageStreamTag were created but are never been updated since. From my point of view, the ImageStreamTags are created when a tag attribute is "added" in an ImageStream. If a tag is removed from an ImageStream, the Adminstration Consol will display ImageStreamTag in the ImageStream with the text "pushed image" in the from column. From my point of view, ImageStreamTags are containing no reference to ImageStreams. Any modification (besides the creation) of tags in the ImageStreams seem to be not reflected in ImageStreamTags.

Is this on purpose that ImageStreamTags are not updated (replaced with a new version), if the associated ImageStream is modified?

Hey @olaf-meyer

Short answer: yes, this is on purpose. At least the imagestream internals. I cannot speak to what the console does.

Longer answer: As RHEL SCL and RH Middleware update their imagestreams with new tags, remove old tags, or change which images a tag points to, we only update in the current release of OCP when we receive that update. Customers pick up those updates by upgrading their clusters to the newer 4.x. Where as you probably have seen with OCP 4.x, we only support N-2 dot versions. i.e. if 4.6 is the current release, only 4.5 and 4.4 are still supported.

Thank you very much in advance,

Olaf

olaf-meyer commented 3 years ago

Hallo Gabe,

thank you very much for the answer. Did I understood correct that the Images are only updated, if the OpenShift version itself is updated (i.e. from 4.5 to 4.6) or by adding new tags to the ImageStreams?

Olaf

gabemontero commented 3 years ago

Hallo Gabe,

thank you very much for the answer.

My pleasure @olaf-meyer

Did I understood correct that the Images are only updated, if the OpenShift version itself is updated (i.e. from 4.5 to 4.6) or by adding new tags to the ImageStreams?

I think you have it right @olaf-meyer though allow be to clarify your verbiage a bit:

Olaf

olaf-meyer commented 3 years ago

Hallo Gabe,

what do you mean by verbiage?

Anyhow, are there plans to annotate image tags with the schedules attribute to keep the tag up to date? I have seen that productive OpenShift clusters run for quite a long time. Certain images i.e. dotnet-core:3.1 will currently not receive updates till the base images changes from rhel7 to rhel8. Before a new tag is available, most of the images get security updates and it is hard to explain to customers, why the updates from the container catalog are not reflected to the ImageStream tags in an OpenShift cluster.

Kind regards,

Olaf

gabemontero commented 3 years ago

Hallo Gabe,

what do you mean by verbiage?

verbiage == the words used

Sorry, I was just trying to convey I think we were on the same page, but I would have worded it slightly differently.

Anyhow, are there plans to annotate image tags with the schedules attribute to keep the tag up to date?

That would be the responsibility of the image providers, not the samples operator. I know those image providers provide details on their images in the containers catalog, but have not heard about them annotating their images as well.

One thing we will be doing as part of samples operator is that in the release notes we'll be documenting which images the image providers removed in a given release. We are even doing this for releases which have shipped, back to 4.4, in addition to future releases moving forward.

But with the current source of requirements we have for samples, we are not going beyond that.

I have seen that productive OpenShift clusters run for quite a long time. Certain images i.e. dotnet-core:3.1 will currently not receive updates till the base images changes from rhel7 to rhel8. Before a new tag is available, most of the images get security updates and it is hard to explain to customers, why the updates from the container catalog are not reflected to the ImageStream tags in an OpenShift cluster.

Yeah, for what it is worth, we have some competing concerns here. Before we digress too much more on all that, I want to get you on a path for surfacing your wishes to the right place. Which to summarize, is updating the sample imagestreams and associated templates, assuming they do not need certain kubernetes level features, in the dot releases of older 4.x releases, in addition to the latest current release.

Among other things, we'd have to get agreement from the image providers for that, as well as take on some additional release plumbing changes and testing.

The best way to pursue this is opening an RFE and get our planning involved. I believe the URL for that is https://issues.redhat.com/projects/RFE/summary

Let me know though if you run into difficulties with that, and I'll reach out to project management here for further guidance.

@siamaksade @adambkaplan @sbose78 @pedjak FYI ^^

Kind regards,

Olaf

openshift-bot commented 3 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 3 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

X-dark commented 3 years ago

Anyhow, are there plans to annotate image tags with the schedules attribute to keep the tag up to date?

That would be the responsibility of the image providers, not the samples operator. I know those image providers provide details on their images in the containers catalog, but have not heard about them annotating their images as well.

@olaf-meyer is speaking about the scheduled=true you can add on the ImageStream to have OpenShift automatically pull the last image for a given tag. This cannot be done by the image provider as it provides an image, not an ImageStream: https://docs.openshift.com/container-platform/4.7/openshift_images/managing_images/tagging-images.html#images-add-tags-to-imagestreams_tagging-images

One thing we will be doing as part of samples operator is that in the release notes we'll be documenting which images the image providers removed in a given release. We are even doing this for releases which have shipped, back to 4.4, in addition to future releases moving forward.

Are you speaking here about new tags or new images within a tag?

But with the current source of requirements we have for samples, we are not going beyond that.

Not wanting to provides latest patch releases is one thing. Not updating images within a tag leading to having images present with major CVE (as the image present will always be the one that a given tag was pointing to at install time) is quite another thing...

/remove-lifecycle rotten

gabemontero commented 3 years ago

Hey @X-dark - wanted to let you know that addressing your points / concerns is "on the books". But one of the items going on with samples is that we are transferring ownership of it to @dperaza4dustbit and his team over the course of this summer. As such, I've asked him and his team to go over the training materials I've given them so far and see what they can come up with.

So if you can bear with us for a little bit, we'll make progress on this, it just may be a bit "slower".

All that said, I'll embed some "starting responses" to tee things up for @dperaza4dustbit and team

Anyhow, are there plans to annotate image tags with the schedules attribute to keep the tag up to date?

That would be the responsibility of the image providers, not the samples operator. I know those image providers provide details on their images in the containers catalog, but have not heard about them annotating their images as well.

@olaf-meyer is speaking about the scheduled=true you can add on the ImageStream to have OpenShift automatically pull the last image for a given tag. This cannot be done by the image provider as it provides an image, not an ImageStream: https://docs.openshift.com/container-platform/4.7/openshift_images/managing_images/tagging-images.html#images-add-tags-to-imagestreams_tagging-images

Yes it is fair to say I glossed over a few details and possibly merged roles that might actually be assigned to different people.

@dperaza4dustbit and team will provide the additional detail I was referring to that connects the dots between a given image, the non-openshift repository it is produced from, and how it ultimately lands in samples operator, and where "scheduled=true" could come into play wrt that.

One thing we will be doing as part of samples operator is that in the release notes we'll be documenting which images the image providers removed in a given release. We are even doing this for releases which have shipped, back to 4.4, in addition to future releases moving forward.

Are you speaking here about new tags or new images within a tag?

But with the current source of requirements we have for samples, we are not going beyond that.

Not wanting to provides latest patch releases is one thing. Not updating images within a tag leading to having images present with major CVE (as the image present will always be the one that a given tag was pointing to at install time) is quite another thing...

Addressing CVEs absolutely falls under the type of things wrt updating samples content in z-stream / released OCP versions. But their is a very specific process we have to follow, that involves more than responding to a github issue.

@dperaza4dustbit and team will detail that, and then work with you all to gather the necessary details on any specific images/CVEs you all might have in mind.

/remove-lifecycle rotten

X-dark commented 3 years ago

Hi @gabemontero, thanks for your answer.

We are currently in the process of testing RHACS and we were quite surprised to see all those images flashing red despite provided by the sample operator. I understand the difficulty of handling such issues while going through a transition process.

I have a support request open anyway, so I will see with them if they have any suggestion to at least do a one shot refresh of those tags to lower the number of CVE while we wait for a more automated solution.

gabemontero commented 3 years ago

OK are skill transfer is still in progress, but we've received another issue related to this, again with the vulnerability warnings in the console with respect to sample images (see https://github.com/openshift/cluster-samples-operator/issues/383#issue-954960893 ) so I'm stepping in with an update while @dperaza4dustbit and team get up to speed.

1) I noted previously about how updates are only pulled into our current release, and that to backport updates to the z stream, we would need support cases / bugzillas.

Do we want to entertain adjustments to that policy? Dev / engineering is not there yet.

2) One item dev/eng is considering of providing samples outside of the core OCP install, replaced by either by an OLM based operator or via the OLM related OpenShift Helm catalog. Among other things, this would give us greater flexibility and provide updated samples to OCP release that are already GA'ed. So resources may be applied to address this situation in that manner instead.

3) Another item I do not believe I covered here precisly previously is that deprecated imagestreamtags which are no longer present upstream and at https://github.com/openshift/library are explicitly left in imagestreams by the samples operator during upgrade. See https://docs.openshift.com/container-platform/4.8/openshift_images/configuring-samples-operator.html#images-samples-operator-deprecated-image-stream_configuring-samples-operator

This is done so that dependent deployment and builds are not immediately broken. It is currently left to cluster admins to ensure there are no dependencies on such items, and then they can manually update the imagestreams to remove those tags.

Note: on new installs those old imagestreamtags are not installed.

We did originally consider more automated methods of dealing with this situation. There are possibilities, but not without cost nor complexities to consider when trying to address all the angles.

Could we revisit this: certainly. Where and how to make changes in this space: while engineering's collective understanding of what OCP operators can/should do, and should not / cannot do in this space, has evolved, and I can think of a few approaches to this situation, I would not claim there is a slam dunk solution.

And effort in this space would required discussion in OCP planning, as well and technical iteration via the OCP enhancement process at https://github.com/openshift/enhancements/. And there is the consideration that we are considering samples management to be something outside of OCP, and balancing resources spent on this problem there vs. in the current OCP samples operator.

As I type, Jira is unfortunately down :-) so I can't cite current epics arounds samples, open new items, etc., but between @dperaza4dustbit and myself, we will take on at least getting this requirement registered so the due diligence I mentioned above occurs.

@sbose78 @bparees @pedjak @adambkaplan @openshift/openshift-team-build-api FYI

bparees commented 3 years ago

So i'm not clear if the "vulnerable" images are present as a result of:

1) being installed at an older OCP version, but not being removed/updated during OCP updates or 2) the current version of OCP has stale content or 3) the images themselves do not have newer versions available

but generally our philosophy as i see it should be: 1) never remove a version tag from the samples operator library in a z-stream (meaning no matter what level of z you do a fresh install from, you get the same tags) 2) remove any version tags (or even entire imagestreams) we want from the samples operator library when we create a new y-stream (meaning if you do a fresh install of 4.8, it may not install all the same tags that 4.7 installed), but we should prefer not to remove an imagestream or tag w/o a good reason 3) never remove existing imagestreams or tags as part of upgrading a cluster, even if the new version does not contain those tags (as Gabe said, we don't want to break people deployments/etc) 4) The version tags that we provide should point to Major.Minor version tags, with the expectation that the teams owning those images are patching and updating those version tags for security issues, so that we don't need to change that imagestream tags to get updates 5) As i recall we don't automatically re-import tags, meaning even if the team does update a tag to fix an issue, the OCP cluster won't see it unless the admin does a re-import to refresh the tag. I don't think we want to revisit that, but we should be sure our docs mention it and talk about how to manually import or use scheduled imports (@gabemontero i forget if a user can set scheduled import on an imagestream that is managed by the samples operator? or will the samples operator unset it?)

gabemontero commented 3 years ago

thanks @bparees

I've embedded some items below to either more specifically relay my understanding of things, or add some detail which hopefully moves the conversation along.

So i'm not clear if the "vulnerable" images are present as a result of:

1. being installed at an older OCP version, but not being removed/updated during OCP updates
   or

Yeah to be fair, I have some context outside of this particular issue. The concern is predominantly your 1) @bparees

2. the current version of OCP has stale content
   or

3. the images themselves do not have newer versions available

but generally our philosophy as i see it should be:

1. never remove a version tag from the samples operator library in a z-stream (meaning no matter what level of z you do a fresh install from, you get the same tags)

Yeah the crux of the ask as I have come to understand it centers around having the operator remove imagestreamtags which show up in these vulnerability lists in some fashion (opt in via config change or analysis of usage, along with adding some knowledge into the operator of which imagestreamtags have vulnerabilities, or other implementation options I'll refrain from to keep this short).

So we either hold to your philosophy point here and continue with a cluster admin have to edit the imagestreams manually to remove imagestreamtags that get flagged by these warning, or we take on a philosophical shift.

2. remove any version tags (or even entire imagestreams) we want from the samples operator library when we create a new y-stream (meaning if you do a fresh install of 4.8, it may not install all the same tags that 4.7 installed), but we should prefer not to remove an imagestream or tag w/o a good reason

We have this to some degree already, but only from a "end of life" perspective (vs. a vulnerability one), in that as versions of items go past end of life, the corresponding imagestreamtags are removed in what sample providers give us in https://github.com/openshift/library

3. never remove existing imagestreams or tags as part of upgrading a cluster, even if the new version does not contain those tags (as Gabe said, we don't want to break people deployments/etc)

IMO we maintain this, but IF we add removal of imagestreamtags that correspond to versions with vulnerabilities, it happens because of what ever trigger drives it (an API change where the cluster admin asks us to remove, something more automatic IF we can construct a reasonable algorithm for making such a decision) leads to the removal, either immediately, or in some batch fashion. But again, implementation details I'll just gloss over until if/when we make the philosophical change to go down this path.

4. The version tags that we provide should point to Major.Minor version tags, with the expectation that the teams owning those images are patching and updating those version tags for security issues, so that we don't need to change that imagestream tags to get updates

+1 .... but part of the problem with having samples ship with core OCP is that while they are part of core content now, OCP has no final say so / control on whether the other orgs in RH that own these images does what @bparees lays out ^^ :-)

5. As i recall we don't automatically re-import tags, meaning even if the team does update a tag to fix an issue, the OCP cluster won't see it unless the admin does a re-import to refresh the tag.  I don't think we want to revisit that, but we should be sure our docs mention it and talk about how to manually import or use scheduled imports (@gabemontero i forget if a user can set scheduled import on an imagestream that is managed by the samples operator?  or will the samples operator unset it?)

generally that is the case i.e. we don't automatically re-import, and in fact, we have an RFE to add an option for that; it should be noted, that the imagestream update on a cluster upgrade will refresh the tag IIRC

To your questions @bparees, and admin can do manual imports. I do it all the time while testing various things.

They can also set up scheduled imports. But when an upgrade happens and the update of the imagestream occurs again, the scheduled import setting will be lost as part of the spec update, and they'll have to redo it after the upgrade completes.

bparees commented 3 years ago

So we either hold to your philosophy point here and continue with a cluster admin have to edit the imagestreams manually to remove imagestreamtags that get flagged by these warning, or we take on a philosophical shift.

i'm inclined to hold to it, and if admins aren't comfortable with that, they can edit the samples operator config to tell it to stop installing this content on their clusters.

and/or they can use tools like ACS to scan their clusters for use of vulnerable content.

I feel the admin pain, but if we get into the business of removing tags out from under clusters, that is also going to be a major source of pain.

openshift-bot commented 2 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

X-dark commented 2 years ago

Hi,

It seems I missed the last comments in the discussion.

4. The version tags that we provide should point to Major.Minor version tags, with the expectation that the teams owning those images are patching and updating those version tags for security issues, so that we don't need to change that imagestream tags to get updates

5. As i recall we don't automatically re-import tags, meaning even if the team does update a tag to fix an issue, the OCP cluster won't see it unless the admin does a re-import to refresh the tag.  I don't think we want to revisit that, but we should be sure our docs mention it and talk about how to manually import or use scheduled imports (@gabemontero i forget if a user can set scheduled import on an imagestream that is managed by the samples operator?  or will the samples operator unset it?)

For me, the issue is exactly those two points. We have tags that are imported by the cluster sample operator, that have been updated upstreams (within the tag as described in 4.), but the tag update is not reflected on our clusters.

Setting manually the schedule on all imagestreamtags is doable easily with a few bash loops but this is not practical to do, especially if this is being reset by cluster upgrades.

What is preventiing to have those imagestreamtags created with the schedule set by default? Or would it be possible to have a setting somewhere to enable this at the operator level?

/remove-lifecycle stale

bparees commented 2 years ago

What is preventiing to have those imagestreamtags created with the schedule set by default?

it's a decision we made, the reasoning behind the decision is that it we enabled scheduled import by default then:

1) the fleet of openshift clusters(every cluster running anywhere in the world that isn't disconnected) would potentially drive a lot of load to external registries polling for updates to images

2) when users on the clusters consume those samples through deploymentconfigs and buildconfigs, and then one of the tags gets updated, it will drive a deployment/build storm across the cluster as every workload that consumed one of these tags gets simultaneously redeployed and/or rebuilt

So we did not want to put clusters in a situation where they would automatically be opted into that behavior unknowingly, but rather preferred that cluster administrators make that choice deliberately.

openshift-bot commented 2 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 2 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

X-dark commented 2 years ago

/remove-lifecycle rotten

gabemontero commented 2 years ago

@dperaza4dustbit - wasn't a Jira story/epic created to track configuration options to turn on image stream scheduled import?

openshift-bot commented 2 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

yrro commented 2 years ago

/remove-lifecycle stale

X-dark commented 2 years ago

Hi, as I don't see any significant update on that topic, I plan to deploy a patch with the patch-operator that will target all ImageStreamTag in the openshift namespace and add the scheduled import policy. Hope I am not putting my clusters at risk doing this but this is just no longer acceptable to have openshift provided images reported as containing high CVE.

openshift-bot commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 1 year ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

yrro commented 1 year ago

/remove-lifecycle stale

yrro commented 1 year ago

/lifecycle frozen