Closed tumido closed 1 year ago
/assign @tumido
/label wg/byon-build-pipelines
@tumido: The label(s) /label wg/byon-build-pipelines
cannot be applied. These labels are supported: community/discussion, community/group-programming, community/maintenance, community/question, deployment_name/ocp4-stage, deployment_name/ocp4-test, deployment_name/moc-prod, hacktoberfest, hacktoberfest-accepted, kind/cleanup, kind/demo, kind/deprecation, kind/documentation, kind/question, sig/advisor, sig/build, sig/cyborgs, sig/devops, sig/documentation, sig/indicators, sig/investigator, sig/knowledge-graph, sig/slo, sig/solvers, thoth/group-programming, thoth/human-intervention-required, thoth/potential-observation, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, triage/accepted, triage/duplicate, triage/needs-information, triage/not-reproducible, triage/unresolved, lifecycle/submission-accepted, lifecycle/submission-rejected
/assign @mimotej
So far I have conducted some experiments and I have the following information:
importPolicy.scheduled
on Imagestream
causes that OCP periodically (every 15 minutes by default) checks for changes on the image (meaning it checks if digest of an image has changed). This can be useful for automatic updates of images, but it does not generate any event, so it might be harder to detect.referencePolicy
works in the same way in OCP 4.x as in OCP 3.x - not sure why it is not mentioned in the docs, but there doesn't seem to be any change. One bit that is also mentioned in the docs is that when using referencePolicy.type Local
we have to set importPolicy.insecure
otherwise the image will not be pulled. When using referencePolicy.type Source
this is not the case, and you don't have to set importPolicy.insecure
. One interesting bit, it seems that even in case of referencePolicy.type Source
ImageStream
stores image to internal registry, but only instructs client to pull from external when possible (it also points to manifest SHA address not tag). In my test I have created ImageStream
with referencePolicy.type Source
and even though I deleted image from external image registry it was still able to pull image which didn't exist. (note *)imagePullPolicy
can have three values IfNotPresent
, Always
, Never
, where Never
isn't useful for us, because it only starts container when the image is pre-cached on the node. ifNotPresent
checks if an image exists with same name and tag (it doesn't consider differences in digests) and if it is available it creates a container, otherwise it pulls an image. Always
also checks if image is pre-cached, but it also checks digests and if they are not the same (for example tag didn't change, but there were some security patches), it pulls new image.ImageStream
does not generate event upon image change, so this option is not usable in our case.
Combing policies 1-3, seems to result in predictable behavior which seems to follow what is written in docs or what I have written above.
There are still some scenarios which need further testing, once I try other possible scenarios I will extend this comment on these notes as well.*EDIT: Seems that OCP has a bug, which causes ImageStream
to not be resolved properly more about here: https://bugzilla.redhat.com/show_bug.cgi?id=2000216 . In my testing I have been using "workaround", where I pointed image to internal registry, which means that at least my observation about reference.Policy
might not be 100 % correct. I will try looking in to this problem more (maybe use annotation-based triggers
as was suggested in thread linked above...).
Great research! Thank you!
- ...but it does not generate any event, so it might be harder to detect.
I think it won't be a problem. It generates event/action via additional kubernetes resources like the creation of Image
resource and ImageStreamTag
resource update. That should do the trick.
- ...i n my test I have created
ImageStream
withreferencePolicy.type Source
and even though I deleted image from external image registry it was still able to pull image which didn't exist.
I think this is the behavior we want to achieve, correct? Cache the image in local registry and reuse it from there, without ever pulling from a remote for pod spawns. Good.
- ...
ifNotPresent
checks if an image exists with same name and tag (it doesn't consider differences in digests) and if it is available it creates a container, otherwise it pulls an image.
What does it mean if the image is available? In the internal registry or in node cache? When it is not available where does it pull from (internal registry or remote location)? Is this affected by referencePolicy.type
?
- ... how to trigger pipelines when image changes, but as I mentioned earlier ImageStream does not generate event upon image change
Yes, something like Andrew's demo will solve it for us. Again, we don't need Events
, Knative can trigger on API actions/changes, that's all we need. I was eyeballing KNative for this prior to knowing about Andrew's demo. I think we're heading in the right direction here.
In my test I've used an image stream with referencePolicy.type: Local
and importPolicy.scheduled: true
:
I think this solution works, wdyt @mimotej ?
Yes, I think this configuration will give us most consistent results out of all options :+1: .
Good, then the proposed design will be:
flowchart TD
A[Image updated in remote container repository]
B[ImageStreamTag updated in internal container repository]
C[Knative eventing triggers a PipelineRun]
D[Mark phase 'Updating/Validating' and hides it from UI]
E[Run validation checks]
F{Did checks pass?}
G[Mark phase 'Success' and show in UI]
H[Mark phase 'Failed']
A --> B --> C --> D --> F
C --> E --> F
F -- Yes --> G
F -- No --> H
This approach has few drawbacks.
Local resolution has 1 significant positive effect:
Questions: Is there a way to immediately trigger the image stream update?
Possible solution:
Each ImageStream contains 2 tags which are initially the same
scheduled: true
.If update is received, the second tag is being updated and validated by a pipeline.
During this whole time the 1st tag is still being served.
If validation on the 2nd tag passes 1st tag is updated to point to this image. If it fails it stays the same and is kept/marked as outdated.
flowchart TD
A[Image updated in remote container repository]
B[ImageStreamTag updated in internal container repository on the 'backup' tag]
C[Knative eventing triggers a PipelineRun]
D[Mark phase 'Updating']
E[Run validation checks]
F{Did checks pass?}
G[Mark phase 'Success']
H[Mark phase 'Outdated Update Failed']
I[Replace pointer in visible tag with the updated backup]
A --> B --> C --> D --> F
C --> E --> F
F -- Yes --> G --> I
F -- No --> H
Decision. This is postponed/deprioritized due to the necessity to introduce a KNative dependency. Revisit post-MVP.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
going to /lifecycle frozen this to prevent auto-closure for the time being
/kind feature /priority backlog
mentioned to @codificat
As BYON evolved into Custom Notebook Images, which currently use an operator to manage the imagestreams and pipeline runs, I believe this is not relevant anymore.
/close
Having said that, it's worth exploring what meteor-operator could do to keep custom images current. But that will be handled in its own repo and issue
@codificat: Closing this issue.
Investigate how to manage imagestream updates. This issue is looking for answers/analysis on:
Imagestream
in OpenshiftimportPolicy.scheduled
onImagestream
referencePolicy
in OCP 4.x, since it's missing from the docs, but present in the resource manifests (OCP 3.x docs here)Based on my initial screening it seems there's no way to gate when image stream tag import if set to scheduled. Also it's unsure when the cache is cleaned or accessed (is this per node?) and how an image is cached. Investigate what happens when pods are started: if the image is pulled from a remote repository or mirrored to internal container registry on the first start and pulled from there. What happens when pod is scheduled to a different node (again pulling from internal/external registry?) Experiments needed. It effects when we get the updated image and if we know about it or user is served a newer version withouth admin/validation pipeline noticing first.
Besides scheduled import described above, there's no common reactive facility to initiate/trigger notification from remote registries - webhooks etc, this is remote repository specific and must be set on the remote repository side.