opencontainers / image-spec

OCI Image Format
https://www.opencontainers.org/
Apache License 2.0
3.41k stars 631 forks source link

File Lineage Support Layered Media #424

Open WhisperingChaos opened 7 years ago

WhisperingChaos commented 7 years ago

I performed an issue search on the keyword "lineage" and it detected only one closed issue in this repository that referenced it. Lineage is a concept akin to a "family tree" which tracks the evolution of component and its offspring. Since an image layer captures a component's state at a "moment in time", its position relative to other layers may reflect its position in a given component's family tree. Therefore, is there an effort to more directly represent this notion of lineage into this image spec to potentially assist forensics which would benefit, for example, security or an ability to gauge the variation of a component vs its reliability?

Thanks!

wking commented 7 years ago

On Fri, Oct 28, 2016 at 10:13:12AM -0700, Rich Moyse wrote:

… is there an effort to more directly represent this notion of lineage into this image spec…

You don't need a structure for lineage to use an image, so there isn't a structured field for it at the moment. There was some discussion of using parent manifests in the child's ‘layers’ (e.g. 1), but the consensus was that you could accomplish the same thing with less work by inlining the parent's layers directly 2. So currently folks who want to distribute this sort of information should use ‘history’ 3 or add their own annotation keys along the lines of 4.

If you want to leverage CAS, you could also define a new commit-like media type and have a commit-DAG pointing at manifests (or manifest lists, or whatever) as the payload. With type-map-based type-handling logic like #403, plugging that sort of third-party type into the OCI tooling should be fairly straightforward.

WhisperingChaos commented 7 years ago

@wking

Thanks for your thoughtful answer and references! It might be interesting to use custom properties, mentioned by one of your provided references, to store lineage info.

stevvooe commented 7 years ago

@WhisperingChaos "Classically", history has been a component of container images. The implementations embedded lineage directly in the format. However, these features come at great cost in distribution, security and runtime (I can elaborate on these costs if you don't agree the premise).

A much better system would maintain lineage externally to artifacts as they are built. Such a system would be much more secure without incurring the distribution and runtime costs.

The history field is maintained to provide the notion of lineage, until such metadata systems exist.

WhisperingChaos commented 7 years ago

@stevvooe

Yes - I agree, detailed lineage information should be available as a separate artifact from the resulting image, however, it would be helpful to encode a form of DNA (densely encoded, and small) within an image to identify each artifact in an image and the "mutations" between parents and offspring in runtime images.

As you discuss in your post, build time history incurs "great cost" to the runtime image and system. For this reason, many have built pipelines whose resulting runtime images no longer contain any build time artifacts or history. Given this desire to eliminate build time artifacts, there's probably a large number of images now and to be manufactured in the future whose lineage won't be, at least easily, traceable. Therefore, it would be beneficial to define a means to record lineage in a runtime image.

stevvooe commented 7 years ago

@WhisperingChaos We already have this DNA: the components of an image are content addressable. Links between components use these content addresses to maintain these relationships. Fields like ChainID, Parent, and DiffID provide this. To the casual observer, these may not look useful but, in fact, they provide the aspects of lineage that affect runtime.

However, as I stated above, encoding lineage isn't free and different people have different ideas about the granularity of data required. Even here, with the tools for lineage existing already, you have made the conclusion that there are no tools for lineage built into the image specification, when, in fact, they already exist.

Why burden the format when users' will disagree on the level of granularity of lineage required? Why make all users pay this cost when only some users will need it? It seems much more prudent to leave this decision to packaging systems, as we have done in the past. The set of images can then be curated within this packaging system, providing a level of guarantee (and curation) fit for purpose.

That said, I'm not sure how constructive we can be speaking in the abstract. Let's focus on the following:

  1. What specific use cases for lineage are not possible with the current specification?
  2. What additions are you proposing to address the missing functionality?
vbatts commented 7 years ago

@stevvooe related to https://github.com/opencontainers/image-spec/issues/600 ?

stevvooe commented 7 years ago

@vbatts I did not intend for these to be related. #600 is just "append a string to keep us all sane".

I still attest that we have the necessary structural information to recover lineage, even if there aren't explicit pointers. Albeit, the general problem of label propagation (and "lifting") can be used to address this issue. My general aversion to heading down that path are labels that may introduce hash instability, although that is less of a problem at the manifest/config level, that it is for layers.