redhat-openshift-ecosystem / openshift-preflight

Operator and container preflight certification tests
Apache License 2.0
58 stars 66 forks source link

Deduplicate layer IDs to handle cases where duplicate layers are in the layer tree #1172

Closed komish closed 1 month ago

komish commented 1 month ago

In certain edge cases, a container builder may produce multiple "empty" layers that may have the same digest value. When parsing those layers, we fail to find an RPM database (which is expected) and therefore carry the previous layer's package list forward.

We run into issues when layers with the same ID and no modified files (e.g. "empty" layers) are separated by layers WITH package changes, such that when we carry forward file lists from previous layers, we would be impacting an earlier entry's file list because we use a map where the key is the layer ID. Concrete example below[1].

This PR changes our mapping to glue together a layer's ID and its relative position in the image, which prevents overwriting previous layer values and fixes a bug in how we detect modified files layer-over-layer.

It also adds logging to map DiffID, LayerID, and the deduped/calculated value in trace logs to assist with debugging.

[1]

Assume we have 4 layers with these modified files.

-- base layer omitted --
0: sha256:1111 - [] 
1: sha256:2222 - ["/modified", "/path/to/rpmdb"]
2: sha256:1111 - []
3: sha256:3333 - ["/unrelated"]

We parse layer 0 and see it doesn't have any files (nor RPMDB), so we store a mapping of that layer's ID ("sha256:1111")to a list of packages that might be associated (which will be empty in this example because the previous, omitted layer is the base layer.

As soon as layer 1 (first layer to contain an RPMDB) is evaluated, that becomes our source of truth. We store the package lists associated with it and use those to inform the "do not modify" list. In our mapping, the package information is mapped to this layer's ID as the map key ("sha256:2222").

When layer 2 is parsed and found to be empty, we map its package list to the previous layer (1) package list. Because the layer ID of layer 2 matches that of layer 0, we've now inadvertently populated layer 0's package list. This is the problem.

Later, when we validate each of these layers, we detect that layer 0 has packages (the same as layer 1), and treat that as our source of truth. Layer 1, then, is considered to have modified all of the same files as layer 0 - which fails the check.


To solve this, I bind the layer ID with its index position (e.g. "00-sha256:1111"). It makes things a little bit heavier, but barring refactoring to remove the map AND array usage and/or change the way this works - this seemed simpler and effective enough.

acornett21 commented 1 month ago

I'm okay with this implementation, but also want to point out that we could also remove the empty layers before we build the maps/associations (ie before we iterate over the layers). Though, with that type of implementation Im not sure what the cutoff point for empty means size wise, since crane we are dealing with the uncompressed values and they are not zero at that point.

It would be nice to know others thoughts on various approaches as well.

komish commented 1 month ago

+1 @acornett21 I briefly tested skipping empty layers based on size - and just ended up at "how do I confidently determine that the layer is truly empty" aside from looking at content. The smallest value I saw returned from layer.Size() was 34. I didn't test much beyond that, but it's another option. Will wait for comment.

acornett21 commented 1 month ago

how do I confidently determine that the layer is truly empty

This is where I landed as well, I thought I was missing something as well with the size calculation. The one advantage of this I see, is it could speed up our processing by spiking some layers. But if don't have confidence on what the min size should be, they the saying cycles point is probably moot.

dcibot commented 1 month ago

from change https://github.com/redhat-openshift-ecosystem/openshift-preflight/pull/1172:

jfrancin commented 1 month ago

I hope this will get reviewed soon - have a partner waiting on the new Preflight version with this fix...

komish commented 1 month ago

@jfrancin Yep, sorry for the delay - I've been trying to replicate the failing image with little success, and finally cracked it just a bit ago. Whipping up a test case, and we should be set.

acornett21 commented 1 month ago

@jfrancin we are well aware of the urgency, like mentioned we've been trying to understand how the partner built an image with multiple of the same layer (and empty ones at that), and come with with a repeatable test so we do not have an regressions. Figuring out how they got an image in this state, has been extremely difficult to replicate.

komish commented 1 month ago

@acornett21 @bcrochet A note -

This PR reconfigures HasModifiedFiles such that the packageDist value is extracted in a separate function from the package file map building process. This simplifies testing of the file mapping process by allowing us to build an ImageReference fragment that doesn't contain the extracted filesystem needed by the packageDist logic.

As a result, several return signatures have changed, dropping the returned packageDist.

Also note that the test written here uses a static fixture. The containerfile used to build it is included - but using a static fixture saves us storing the image in-repo (~110m). It's been documented as a TODO item to optimize at a later date.

coveralls commented 1 month ago

Coverage Status

coverage: 84.782% (+3.2%) from 81.591% when pulling 517a88b7461062bf08478e525e2f369590c7c384 on komish:fix-hmf-dupe-layers into 354d10d9af8b5a64c7c2230a6656186957c3ed9f on redhat-openshift-ecosystem:main.

dcibot commented 1 month ago

from change https://github.com/redhat-openshift-ecosystem/openshift-preflight/pull/1172:

openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: acornett21, bcrochet, komish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/redhat-openshift-ecosystem/openshift-preflight/blob/main/OWNERS)~~ [acornett21,bcrochet,komish] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment