opencontainers / image-spec

OCI Image Format
https://www.opencontainers.org/
Apache License 2.0
3.43k stars 633 forks source link

Any chance of changing the whiteout file approach? #24

Open timthelion opened 8 years ago

timthelion commented 8 years ago

It seems unfortunate that a new standard should use a method which unecesarilly limits the standard. With the .wh file approach, base images can no longer contain arbitrary data. This means, for example, that you cannot have an image with example OCI image-spec data stored in it. Is there any possibility of changing this to use, for example, a white out list instead. So that the files that made a layer would be:

VERSION
layer.tar
whiteouts
json

That would mean that truely arbitrary data could be stored in the images, which would be really nice :)

vbatts commented 8 years ago

To be clear, you are saying that the .wh approach is limited? And the whiteout file is preferred?

If so, I agree.

On Thu, Apr 14, 2016, 17:16 Timothy Hobbs notifications@github.com wrote:

It seems unfortunate that a new standard should use a method which unecesarilly limits the standard. With the .wh file approach, base images can no longer contain arbitrary data. This means, for example, that you cannot have an image with example OCI image-spec data stored in it. Is there any possibility of changing this to use, for example, a white out list instead. So that the files that made a layer would be:

VERSION layer.tar whiteouts json

That would mean that truely arbitrary data could be stored in the images, which would be really nice :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/opencontainers/image-spec/issues/24

timthelion commented 8 years ago

Creating a list of files to be whited out, and keeping that outside of layer.tar is better than putting .wh files in layer.tar. We wouldn't want a situation like this: http://git-annex.branchable.com/forum/Storing_git_repos_in_git-annex/

vbatts commented 8 years ago

I wholly agree and intend to see it done as a whiteout file list.

On Thu, Apr 14, 2016, 17:26 Timothy Hobbs notifications@github.com wrote:

Creating a list of files to be whited out, and keeping that outside of layer.tar is better than putting .wh files in layer.tar. We wouldn't want a situation like this: http://git-annex.branchable.com/forum/Storing_git_repos_in_git-annex/

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/opencontainers/image-spec/issues/24#issuecomment-210154542

philips commented 8 years ago

I think this is something we should think about for post v1.0.0. I whole heartedly agree but in the ideal case the initial v1 serialization spec is binary compatible with the existing docker serialization to save the sanity of all of the existing registry folks like acr, gcr.io, Quay, Hub, etc.

timthelion commented 8 years ago

Would it be possible to use the whiteout list file IFF a whiteout list file exists, and otherwise use the .wh approach?

philips commented 8 years ago

@timthelion Yes, that would be the right way of approaching it. It would be a schema bump which would be a version break. Happy to consider adding this feature to fix the issue but I do want to hold off until we get post v1.0.0 (in a couple of months).

timthelion commented 8 years ago

@philips so basically, you want to have the 1.0 release be supported by Docker/CoreOS/whatever, without actually haven't to change Docker/CoreOS/whatever's code? Basically, there will be no actual technical changes to the spec before 1.0?

cyphar commented 8 years ago

@timthelion No breaking technical changes.

wking commented 7 years ago

On Thu, Apr 14, 2016 at 02:26:16PM -0700, Timothy Hobbs wrote:

Creating a list of files to be whited out, and keeping that outside of layer.tar is better than putting .wh files in layer.tar.

If we don't mind picking up a non-standard tar entry, star (Schilling's tar) uses pax extension headers and defines SCHILY.filetype with (among other things) a "whiteout" value representing a BSD whiteout directory entry 1. And as far as I can tell, that's the same sort of whiteout we're interested in. So if we want a way to represent whiteouts without leaving tar or restricting the legal filename space, that's probably a good choice.

aecolley commented 7 years ago

Now that v1.0.0-rc1 is out, it's the last chance to consider this before v1.0.0 is final. After that, this will be such an incompatible change that it will have to wait for v2.0.0 (assuming SemVer).

wking commented 7 years ago

You could add a new whiteout approach in 1.1. You'd only need to go to 2.0 if you remove or make backward-incompatible changes to an existing approach.

aecolley commented 7 years ago

A backwards-compatible change means either that a 1.1 image must be processed correctly by a 1.0 extractor, or that a 1.0 image must be processed correctly by a 1.1 extractor; depending on your point of view. In the case of 1.1-image-on-1.0-extractor, the extractor will not know about any way to produce a file named .wh.foo, regardless of how 1.1 decides to represent it. In the case of 1.0-image-on-1.1-extractor, the image cannot contain a layer with any .wh. file, because the 1.0 spec states unambiguously that there is no representation which can produce such a file.

Either way, it seems to me that it's impossible to construct an image which extracts a .wh.foo file into the unpacked bundle, unless either (a) both image and extractor are version 1.1, which is not backwards-compatible by definition; or (b) the image produces different bundle contents in 1.0 extractors and 1.1 extractors; which is an incompatibility all on its own.

Perhaps there's something I'm missing. Perhaps these limitations are acceptable to the project members. Otherwise, it's something that should be addressed before v1.0.0, IMHO.

vbatts commented 7 years ago

Interesting. There is an option of approaching whiteouts like overlayfs does, by setting device to 0 on non-directories and and xattr for directory.

On Mon, Oct 17, 2016, 18:37 Adrian Colley notifications@github.com wrote:

A backwards-compatible change means either that a 1.1 image must be processed correctly by a 1.0 extractor, or that a 1.0 image must be processed correctly by a 1.1 extractor; depending on your point of view. In the case of 1.1-image-on-1.0-extractor, the extractor will not know about any way to produce a file named .wh.foo, regardless of how 1.1 decides to represent it. In the case of 1.0-image-on-1.1-extractor, the image cannot contain a layer with any .wh. file, because the 1.0 spec states unambiguously that there is no representation which can produce such a file.

Either way, it seems to me that it's impossible to construct an image which extracts a .wh.foo file into the unpacked bundle, unless either (a) both image and extractor are version 1.1, which is not backwards-compatible by definition; or (b) the image produces different bundle contents in 1.0 extractors and 1.1 extractors; which is an incompatibility all on its own.

Perhaps there's something I'm missing. Perhaps these limitations are acceptable to the project members. Otherwise, it's something that should be addressed before v1.0.0, IMHO.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/opencontainers/image-spec/issues/24#issuecomment-254353854, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEF6fCsS60XgtwhoZjJ5FVy8Go98VvXks5q0_cagaJpZM4IHyeg .

aecolley commented 7 years ago

Unfortunately, support for extended attributes varies widely and incompatibly among tar implementations. And that gets us into the tarpit of the #342 discussion (pun accidental, I swear). But it works if the chosen tar format supports it.

wking commented 7 years ago

On Mon, Oct 17, 2016 at 03:30:51PM -0700, Adrian Colley wrote:

A backwards-compatible change means either that a 1.1 image must be processed correctly by a 1.0 extractor, or that a 1.0 image must be processed correctly by a 1.1 extractor; depending on your point of view.

This should be made very clear in the spec, leaving it open to interpretation is going to make make SemVer largely useless. In runtime-spec, opencontainers/runtime-spec#523 made it clear that you may need a new runtime after bumping your config's minor version.

In the case of 1.1-image-on-1.0-extractor, the extractor will not know about any way to produce a file named .wh.foo, regardless of how 1.1 decides to represent it.

Agreed, which is why I want a 1.0 extractor to error out if you give it a 1.1 image.

In the case of 1.0-image-on-1.1-extractor, the image cannot contain a layer with any .wh. file, because the 1.0 spec states unambiguously that there is no representation which can produce such a file.

Well, you can have ‘.wh.foo’ entries. They're just interpreted as “remove foo” and not “please create a file (or directory, or …) at .wh.foo”. If 1.1 declares a new media type (e.g. application/vnd.oci.image.layer.v1.1.tar) supporting SCHILY.filetype whiteout 1 or device 0 2 or some other way to avoid the current path overloading, then 1.0 extractors will correctly die (with “I've never heard of application/vnd.oci.image.layer.v1.1.tar”) and 1.1 and later 1.x extractors will correctly unpack the layer (including any .wh.foo files it contains).

Either way, it seems to me that it's impossible to construct an image which extracts a .wh.foo file into the unpacked bundle, unless either (a) both image and extractor are version 1.1, which is not backwards-compatible by definition; or (b) the image produces different bundle contents in 1.0 extractors and 1.1 extractors; which is an incompatibility all on its own.

Backwards-compat means “the old stuff still works with the new tools”. So 1.0 images would still work with 1.1 tools. But if .wh.foo files are impossible in 1.0 (which seems like the current path), then yeah, no 1.0 images are going to have them. But if you want a .wh.foo file, and are willing to create a 1.1 image and use a 1.1+ extractor, that will work. You don't need to bump to 2.0 (and throw out all of your other 1.0 images).

aecolley commented 7 years ago

@wking OK, I see your point. 1.0 images can work on 1.x extractors so long as they don't attempt to create non-whiteout .wh. files, which is fine. I withdraw my position.

stevvooe commented 7 years ago

After spending some more intimate time with whiteout files (.wh.), the approach currently employed in this specification is really as good as any.

What good does changing this provide? I don't see how this allows layers to have "arbitrary data".

wking commented 7 years ago

On Wed, Oct 19, 2016 at 04:47:27PM -0700, Stephen Day wrote:

After spending some more intimate time with whiteout files (.wh.), the approach currently employed in this specification is really as good as any.

What good does changing this provide? I don't see how this allows layers to have "arbitrary data".

As @timthelion describes in the topic post 1, the current approach makes it impossible to distribute .wh.* files because the entry path is overloading “unpack me to here” and “delete the stuff there”. The alternatives discussed here:

a. An external ‘whiteouts’ 1. b. SCHILY.filetype set to whiteout 2. c. Device set to 0 3.

all either give us an non-overloaded place to store the “delete the stuff there” bit (a and b) or pick a location where the overloading is less restrictive (c).

stevvooe commented 7 years ago

@wking Is there a realistic use case for distributing .wh. files, other than packing up a container runtime into an image?

Please, for the love of god, stop doing this.

wking commented 7 years ago

On Wed, Oct 19, 2016 at 05:53:25PM -0700, Stephen Day wrote:

@wking Is there a realistic use case for distributing .wh. files, other than packing up a container runtime into an image?

I don't have a personal use case for it, but I would like to avoid complication when explaining what folks can put into layers. And from an implementation perspective both SCHILY.filetype set to whiteout and device set to 0 would be very easy to implement in code that already uses the .wh.* approach.

stevvooe commented 7 years ago

@wking I don't really get this: the proposed layout is not how images are laid out. Such a provision requires either having in-band whiteout or a container format, which complicates unpacking. A tar of a tar, while we do it in image layout, should be avoided.

If we use overlay style device, now do you implement devices with windows on NTFS? Sure, you can put them in the tar file, but what happens when they are unpacked? How do you encode these in other archive formats that don't have device support, like zip?

wking commented 7 years ago

On Fri, Oct 21, 2016 at 11:40:17AM -0700, Stephen Day wrote:

A tar of a tar, while we do it in image layout, should be avoided.

I agree, which is why I prefer the SCHILY.filetype whiteout approach or the device 0 approach. Both of those are in-band (like the current .wh.* approach), but SCHILY.filetype is not overloaded at all and device 0 is a less-restrictive overload.

If we use overlay style device…

I think you're talking about @vbatt's device 0 approach here.

… now do you implement devices with windows on NTFS? Sure, you can put them in the tar file, but what happens when they are unpacked?

If we land “device set to zero means whiteout” docs (for application/vnd.oci.image.layer.v1.tar or application/vnd.oci.image.layer.v1.1.tar), then an unpacker handling such a tarball will invoke the whiteout operation whenever it hits a device 0 entry. I don't see how the OS comes into it, since whiteouts are a cross-platform idea. Windows unpackers can still fail if they encounter a device 1 entry, etc.

The SCHILY.filetype-set-to-whiteout approach would also be cross-platform.

How do you encode these in other archive formats that don't have device support, like zip?

You don't, but that's not a big deal. We don't have an application/vnd.oci.image.layer.v1.zip format now, and I don't hear anyone calling for one. If there is a future need for zip-based layers, the authors of the zip-layer spec will need to figure out a scheme for marking whiteouts. Maybe they'll use .wh.*, and maybe they'll use something else, but I don't think the potential presence of a future zip-based format is a good reason to overload the path as a whiteout marker in tar.

timthelion commented 7 years ago

I don't know what the best way to change the spec is, but I personally think that it would include adding a directory to the tarbal with a whiteout list and any other information that we might want to add in the future, so as to make the spec extendable.

That is that the / directory of the layer should include a /opencontainer-data directory.

I have several use-cases in mind.

The first use-case is that of profesor Dr. Janette Lang in the year 2033. Dr. Lang is one of the smartest ecologists in the world, and is currently working on a simulation of wetland health to be run on the worlds fastest supercomputer. Dr. Lang is not even aware that the simulation will be deployed using Docker because dev-ops does the deploymen. Dev-ops doesn't really think to hard about their choice of Docker becasue Docker is completely transparent for Dr. Lang's usecase, as the simulation does not need devices or even network access.

As the simulation is extremely large and runs for a long time, the simulation is occassionally serialized to files with names that look like:

.wh.<wetland-name>.yyyy.MM.dd.hh.mm.ss

These serializations are associated with user editable meta-data files named <wetland-name>.yml.

It takes several weeks for the simlulation to complete. At some point during this time, dev-ops is running an upgrade. They have to migrate the containers, and so they shut everything down, run a docker commit, and then launch everything agian. No problem, Dr. Lang's team included restarting the simulation right in their automated test suit. But this time it doesn't work. The entire simulation starts over again at 0%. A rutine procedure turns into a night filled with long distance phone calls and dev-ops sweating their beards out trying to figure out why the hell things aren't working. No one has a clue what is going on or why everything didn't go as planned, and everyone is blaming everyone else, no one even thinks to blame Docker, just as they don't think to blame BTRFs or the kernal, or the physical storage system or any of the other fundamental system utilities that they are using.

Their final solution is to let the simulation run its full course all over again, with stern order's from Dr. Lang to the dev-ops team that they are "Not to fuck with anything." And a tired and anxoious dev-ops team spends their morning cursing over coffee about how these damned scientists can't learn to program properly and how they assume that the serialization file was never saved in the first place because its "Just not there".


The second use-case is that I want to build an image within an image.


The third use-case is that of Enrico Xavier who is a spanish student of computer science and a great fan of Esparanto, the language which was specifically designed to not have any exceptions with regard to verb conjugation. He hears his lecterur say that you can put any file into an image except one which starts with .wh. and Enrico Xavier correctly assumes that it's stupid to waste your time trying to memorize random exceptions and therefor decides never to use containers or images.


The forth usecase also comes from Dr. Lang's faculty and this time is quite nefarious. Dr. Lang allows students to submit two tarball's each, one containing a set of wetland simulation config files, and a second containing some raw data from real wetlands. She instructs Dev-ops that they should extract these two tarballs into the directory containing the simulator script: wetland-simulator.py. Dev-ops does so, ensuring that the proper flags to prevent tar-bombs are set. If the tarbals happened to be nefarious and attempted to over-write the simulator script file then the tar extraction command would simply error out. They enter the tar extraction commands into their Dockerfile with one RUN command per line. Thus new layers are created for each command. However, the first tarbal does NOT contian wetland-simulator.py. It contains .wh.wetland-simulator.py. The second tarbal does contain a file wetland-simulator.py, however, the students have created a modified version, which allows them to take controll of the super computer and mine bitcoins on it while the simulation is running. The tar extraction command never errors out, as no files are overwritten. No one is any the wiser, and the students are all the richer.

timthelion commented 7 years ago

PS: With the forth example, it is of course reasonable to ask "Why did Dr. Lang decide to put untrusted test data, and an executable into a single directory. That's just bad practice!" And it is also, of course, reasonable to ask "Why would any storage system restrict file names? That's just bad practice. We already went throug that **\ with FAT16 and have decided that restricting the content of file names is just a royal pain in the arse and distinctly archaic."

cyphar commented 7 years ago

While I think @timthelion is being a bit melodramatic with his examples, I agree with his general point.

Why are we treating a rejected filesystem's serialisation format (AUFS) as the "right way of doing things in a standard"? Standards survive for a very long time in general, and I don't think it's a great idea to enshrine legacy parts of Docker (the original image code was just based on how AUFS did things because AUFS was the only real union filesystem at the time) into this particular part of the standard. IMO even something as primitive as a simple file list stored in each layer with whiteout file paths would be a much better way of handling things. A better solution is to use metadata within the tar archive (such as the SCHILY.filetype or device=0 approaches) to tag a particular path as being whited-out -- thus making sure that the data about whiteouts is never separated from the layer data. We also get to avoid a bunch of ambiguity and double-parsing in the spec if we make white-outs a metadata tag for a path rather than a different kind of path.

There is no reason IMO to be artificially limiting what the filenames inside container images can be. As @timthelion said, we should have learned our lesson about restrictive filenames from the old days of filesystems. Let's not repeat history's mistakes.

jonboulle commented 7 years ago

I think we should definitely pursue this, but as previously discussed it's more likely for a 1.1+ horizon.

timthelion commented 7 years ago

I don't know what the best way to change the spec is, but I personally think that it would include adding a directory to the tarbal with a whiteout list and any other information that we might want to add in the future, so as to make the spec extendable.

That is that the / directory of the layer should include a /opencontainer-data directory.

I have several use-cases in mind.

The first use-case is that of profesor Dr. Janette Lang in the year

  1. Dr. Lang is one of the smartest ecologists in the world, and is currently working on a simulation of wetland health to be run on the worlds fastest supercomputer. Dr. Lang is not even aware that the simulation will be deployed using Docker, because dev-ops does the deployment and dev-ops doesn't think anything of the idea, becasue Docker is completely transparent for Dr. Lang's usecase, not needing devices or even network access.

As the simulation is extremely large and runs for a long time, the simulation is occassionally serialized to files with names that look like:

.wh.<wetland-name>.yyyy.MM.dd.hh.mm.ss

These serializations are associated with user editable meta-data files named <wetland-name>.yml>.

It takes several weeks for the simlulation to complete. Sometime during this time, dev-ops is running an upgrade. They have to migrate the containers, and so they shut everything down, run a docker commit, and then launch everything agian. No problem, Dr. Lang's team included restarting the simulation right in their automated test suit. But this time it doesn't work. The entire simulation starts over again at 0%. All of a sudden, you have a night filled with long distance phone calls and dev-ops seating their beards out trying to figure out why the hell things aren't working. No one has a clue what is going on or why everything didn't go as planned, and everyone is blaming everyone else, no one even thinks to blame Docker, just as they don't think to blame BTRFs or the kernal, or any of the other fundamental system utilities that they are using.

Their final solution is to let the simulation run its full course all over again, with stern order's from Dr. Lang to the dev-ops team that they are "Not to fuck with anything." And a tired and anxoious dev-ops team cursing over coffee about how these damned scientists can't learn to program properly and how they assume that the serialization file was never saved in the first place because it's "Just not there".


The second use-case is that I want to build an image within an image.


The third use-case is that of Enrico Xavier who is a spanish student of computer science and a great fan of Esparanto. The language which was specifically designed to not have any exceptions with regard to verb conjugation. He hears his lecterur say that you can put any file into an image except one which starts with .wh.. And Enrico Xavier correctly assumes that it's stupid to waste your time trying to memorize random exceptions and therefor decides never to use containers or images.

On 10/21/2016 08:40 PM, Stephen Day wrote:

@wking https://github.com/wking I don't really get this: the proposed layout is not how images are laid out. Such a provision requires either having in-band whiteout or a container format, which complicates unpacking. A tar of a tar, while we do it in image layout, should be avoided.

If we use overlay style device, now do you implement devices with windows on NTFS? Sure, you can put them in the tar file, but what happens when they are unpacked? How do you encode these in other archive formats that don't have device support, like zip?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opencontainers/image-spec/issues/24#issuecomment-255439497, or mute the thread https://github.com/notifications/unsubscribe-auth/ABU7-JWRbm15jVCDWDh7awRc2sNj76nqks5q2QcTgaJpZM4IHyeg.

wking commented 7 years ago

On Tue, Oct 25, 2016 at 11:12:15AM -0700, Timothy Hobbs wrote:

I don't know what the best way to change the spec is, but I personally think that it would include adding a directory to the tarbal with a whiteout list and any other information that we might want to add in the future, so as to make the spec extendable.

You can extend at any time by minting new media types. You don't have to add support for something like this now on the off-chance that we'll use it later.

That is that the / directory of the layer should include a /opencontainer-data directory.

This has the same path-restriction as the current .wh.* approach, although it limits the restriction to a single path. POSIX pax extended headers provide the same functionality without overloading the entry path. And SCHILY.filetype is one example of what you can do with those extended headers.

timthelion commented 7 years ago

Excelent. I of course don't want to restrict the possible paths at all, and did not know that a better method was possible. Somehow, I did not understand from your discussion that this SCHILY thing is a method of hiding files inside the TAR that remain outside of the file tree.

On 10/25/2016 08:20 PM, W. Trevor King wrote:

On Tue, Oct 25, 2016 at 11:12:15AM -0700, Timothy Hobbs wrote:

I don't know what the best way to change the spec is, but I
personally think that it would include adding a directory to the
tarbal with a whiteout list and any other information that we
might want to add in the future, so as to make the spec extendable.

You can extend at any time by minting new media types. You don't have to add support for something like this now on the off-chance that we'll use it later.

That is that the |/| directory of the layer should include a
|/opencontainer-data| directory.

This has the same path-restriction as the current |.wh.*| approach, although it limits the restriction to a single path. POSIX pax extended http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_02 headers http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_03 provide the same functionality without overloading the entry path. And |SCHILY.filetype| http://cdrtools.sourceforge.net/private/man/star/star.4.html is one example of what you can do with those extended headers.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opencontainers/image-spec/issues/24#issuecomment-256121321, or mute the thread https://github.com/notifications/unsubscribe-auth/ABU7-MQwXRKptr18oOlfACQltXiCI2vrks5q3khzgaJpZM4IHyeg.

stevvooe commented 7 years ago

@timthelion Besides the contrived examples, do you have an extant proof that someone has actually run into a naming collision with whiteout files? I think there are problems in nested container scenarios but that can be solved with filesystem passthrough.

Look, I am not saying that AUFS whiteouts are ideal, but I thought the goal here was to define a container standard based on working systems. Especially, one that people actually use.

If we always focus on the limitations, we'll never realize the benefits.

aecolley commented 7 years ago

"It's already implemented" is literally the only good thing to say about the .wh. scheme, but it's still a powerful argument. Bumping the imageLayoutVersion can be used to signal a better-thought out scheme in the future. Anyone who really needs .wh. files in a 1.0.0 image can rename them into place as the first action in the container's runtime :anguished:.

I'm really only an outsider throwing peanuts from the gallery, but if I were dictator, I'd make the .wh. prefix overrideable by a new field in the image config, because that's a quick-to-implement change. But being stuck with .wh. for 1.0 is OK as long as we're not going to be stuck with it for the long term.

wking commented 7 years ago

On Tue, Oct 25, 2016 at 05:37:37PM -0700, Adrian Colley wrote:

Bumping the imageLayoutVersion can be used to signal a better-thought out scheme in the future…

imageLayoutVersion has nothing to do with it. The thing we'd be bumping is the layer media type (e.g. by minting a new application/vnd.oci.image.layer.v1.1.tar 1). But your main point (that we aren't stuck with this long term unless we plan to support the v1.0 media types forever) stands.

vbatts commented 7 years ago

a couple points:

  1. I agree that pinning this standard to the rejected AUFS' approach is silly. The only redeeming thing is that "that's the way it's done in docker". The .wh. is a silly approach. The lack of ordering still bugs me.
  2. Having a whiteout list provided would be a decent alternative. Best to have it in the ordered list of "layers" or "references" to be applied. Because if it were just a file at the / root inside the tarball, than it would best be either the first entry or directly after an entry for /.
  3. There is an approach taken by overlayfs, which is upstreamed. Detailed https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt. For directories the set an xattr, of trusted.overlay.opaque=y which is somewhat tailored. But for all other file types they use a trick of setting the entry as a character device to 0/0 major/minor device, which hides it. Somewhat clever. Similar enough to the AUFS whiteout approach, with none of the silly .wh. name prefix-ing.
  4. While this talk of a SCHILY. approach is possible does not mean that it should be pursued. I personally am against the SCHILY. approach. It is a PAX header, but would require changes to virtually all tar implementations. Event the golang library would require some re-work to allow for setting arbitrary pax headers like a new SCHILY. type. Which would likely mean carrying/maintaining a fork of upstream stdlib. This is a none-starter.
wking commented 7 years ago

On Fri, Oct 28, 2016 at 06:05:57AM -0700, Vincent Batts wrote:

  1. There is an approach taken by overlayfs, which is upstreamed. Detailed https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt. For directories the set an xattr, of trusted.overlay.opaque=y which is somewhat tailored. But for all other file types they use a trick of setting the entry as a character device to 0/0 major/minor device, which hides it. Somewhat clever. Similar enough to the AUFS whiteout approach, with none of the silly .wh. name prefix-ing.
  2. While this talk of a SCHILY. approach is possible does not mean that it should be pursued. I personally am against the SCHILY. approach. It is a PAX header, but would require changes to virtually all tar implementations. Event the golang library would require some re-work to allow for setting arbitrary pax headers like a new SCHILY. type. Which would likely mean carrying/maintaining a fork of upstream stdlib. This is a none-starter.

I don't think we'd need to fork archive/tar to support SCHILY.filetype whiteout. GNU's tar is already using pax Extended Headers for xattr support:

$ zgrep TMPFS_XATTR /proc/config.gz CONFIG_TMPFS_XATTR=y $ mkdir a $ sudo mount -t tmpfs none a $ touch a/b $ md5sum a/b d41d8cd98f00b204e9800998ecf8427e a/b $ sudo setfattr -n trusted.md5sum -v d41d8cd98f00b204e9800998ecf8427e a/b $ tar --version tar (GNU tar) 1.28 … $ sudo tar -cf a.tar --xattrs a $ strings a.tar | grep SCHILY 64 SCHILY.xattr.trusted.md5sum=d41d8cd98f00b204e9800998ecf8427e

With SCHILY.filetype we'd just be putting a different payload in that typeflag x entry. I can mock up an example if that would help demonstrate the point (or turn up any implementation issues I'm missing ;).

wking commented 7 years ago

On Fri, Oct 28, 2016 at 08:20:06AM -0700, W. Trevor King wrote:

$ strings a.tar | grep SCHILY 64 SCHILY.xattr.trusted.md5sum=d41d8cd98f00b204e9800998ecf8427e

With SCHILY.filetype we'd just be putting a different payload in that typeflag x entry. I can mock up an example if that would help demonstrate the point (or turn up any implementation issues I'm missing ;).

Looking into this more today, it turns out that there is an issue with Go's stock archive/tar: Go currently consumes x typeflag entries internally and throws out any extended headers it doesn't recognize (golang/go#14472). I've suggested Go provide a way around that limitation, but until that happens Go will ignore any extended headers besides SCHILY.xattr.*. Overlayfs's trusted.overlay.opaque (which in the pax extended header would be SCHILY.xattr.trusted.overlay.opaque) avoids the current Go limitation.

I don't understand why overlayfs uses one approach (c0/0) for whiteout files and another (trusted.overlay.opaque) for opaque directories. It seems like they'd use trusted.overlay.whiteout or some such to follow the xattr pattern. Maybe there is a performance benefit to avoiding an xattr lookup for whiteouts, but opaque directories need to be directories so they can hold upper-level content, and you can't have a c0/0 directory ;). Unpacking a layer tarball is not something that happens as frequently as overlayfs file access, so if the reason for using c0/0 in overlayfs is performance, that reasoning may not apply here. On the other hand, c0/0 works for overlayfs, and major 0 is reserved (at least on Linux 1). POSIX doesn't have much to say about major/minor 2, with the most detail coming from the ustar spec. From ustar's typeflag docs 3:

3,4 Represent character special files and block special files respectively. In this case the devmajor and devminor fields shall contain information defining the device, the format of which is unspecified by this volume of POSIX.1-2008. Implementations may map the device specifications to their own local specification or may ignore the entry.

So I'm ok with the following ways for specifying “this is a whiteout”:

a. Follow overlayfs into overloading c0/0, which likely works (and matches an existing implementation) but has unclear namespacing and portability.

b. Follow overlayfs into coining a new SCHILY.xattr.trusted (SCHILY.xattr.trusted.oci.whiteout?). This has better namespacing than (a) but means we're coining a new pax extended header string (even though we're following an established “use a pax extended header” approach).

c. Wait until Go fixes their pax extended header processing and use the existing SCHILY.filetype whiteout.

cyphar commented 7 years ago

I was reading about the Tar archive format the other day, and I noticed that we could just use Typeflag for this. Just specify that 'W' indicates a whiteout path (A-Z is defined as fair-game for tar archives from what I understand). This also helps with ensuring that we don't have to make it clear that you have to add a directory as a "file" when whiting it out.

One thing I ran into with umoci is that I'm not entirely sure what I'm meant to do with parent directories and whiteouts. Is it required that I include all of the parent components of a path in a diff layer if I'm whiting out a file? Or can I also just include the entries which are specifically paths that have been removed / explicitly changed?

wking commented 7 years ago

On Sat, Nov 05, 2016 at 04:42:46AM -0700, Aleksa Sarai wrote:

I was reading about the Tar archive format the other day, and I noticed that we could just use Typeflag for this. Just specify that 'W' indicates a whiteout path…

This is the same approach as SCHILY.filetype whiteout, but without using a pax extended header. I'd rather wait for golang/go#14472 and use the pax extended header, to avoid accidentally misinterpreting some other tar extension that has decided to use a ‘W’ type for something else. Especially with #342 still in flight, leaving which tar unpackers MUST recognize very unclear.

One thing I ran into with umoci is that I'm not entirely sure what I'm meant to do with parent directories and whiteouts. Is it required that I include all of the parent components of a path in a diff layer if I'm whiting out a file? Or can I also just include the entries which are specifically paths that have been removed / explicitly changed?

Layers only need to contain the paths they're touching. If you want to whiteout ./a/b/c but don't need to touch ./a or ./a/b, your tar can have a single entry touching:

./a/b/c

This is covered by the “extracted like a regular tar archive” wording 1 and underlined (for the ./ case) in the text that landed via #408.

cyphar commented 7 years ago

I just ran into this issue with this commit: https://github.com/cyphar/umoci/commit/2a421289266d. Basically if you call lstat on AUFS with a whiteout path you get an EPERM. Which means that minor ordering issues like that commit fixes become permission issues and bugs.

wking commented 7 years ago

On Sat, Nov 05, 2016 at 07:52:54AM -0700, W. Trevor King wrote:

I'd rather wait for golang/go#14472 and use the pax extended header, to avoid accidentally misinterpreting some other tar extension that has decided to use a ‘W’ type for something else.

golang/go#14472 was fixed today, so once Go 1.10 is cut (in February) the SCHILY.filetype approach will be supportable using the Go stdlib.

Collecting a new list of potential approaches (because there have been some new ones since 1):

a. An external ‘whiteouts’ 2. b. SCHILY.filetype set to whiteout (like BSD) 3. c. Device set to 0 (like overlayfs for non-directories) [4,5]. d. SCHILY.xattr.trusted.overlay.opaque set to y (like overlayfs for directories) 5. e. Setting SCHILY.xattr.trusted.oci.whiteout (or OPENCONTAINER.whiteout, or whatever) to y in the same spirit as overlayfs but with an OCI-specific key 6. f. Typeflag set to W 7.

I'd prefer not to do a, since it complicates unpacking if we need more information than the tar content itself. I'm fine with any of the other options, although b, c, and e seem the least likely to result in collisions with third-party pax extensions applying semantics other than “this is a whiteout entry”.

cyphar commented 7 years ago

@wking I think that (b) or (e) are the two best options. (e) is the best from a "let's create a standard that you have to explicitly implement" perspective, but (b) is the best from a "let's do what other people do already" (I consider the BSDs to be a better source of inspiration in this area than Linux filesystem internals). As for (c) and (d), they feel like they'd either be inconsistent with overlayfs anyway (since overlayfs does both) and would likely be inconsistent with how a normal user of a tar parsing library would act. (a) feels clunky, and (f) feels like a misuse of Typeflag which is a very small namespace.

wking commented 7 years ago

On Sat, Aug 26, 2017 at 11:31:11PM +0000, Aleksa Sarai wrote:

@wking I think that (b) or (e) are the two best options. (e) is the best from a "let's create a standard that you have to explicitly implement" perspective, but (b) is the best from a "let's do what other people do already" (I consider the BSDs to be a better source of inspiration in this area that Linux filesystem internals).

Sounds good to me. Does someone want to pick one of those so we can file a PR for a new layer media type that uses the chosen whiteout approach?

stevvooe commented 7 years ago

What problem are we trying to solve here? There is already millions of container images using this approach. Tweaking this won't actually make the format better; it just means that complaint implementations now need to handle two variations, instead of one.

If we are going to make an incompatible format, it would be better to move away from a layered diff approach to something tree oriented (like https://github.com/containerd/continuity).

timthelion commented 7 years ago

This conversation is going around in circles. @stevvooe, you keep on coming back and saying basically the same things.

I am, however, now against any changes, we should never break a standard after 1.0. And 1.0 was a long time ago. Docker's 1.0 that is.

When I'm discussing this with you @stevvooe, I feel like Docker Inc. doesn't really want OCI to be an independent project. I feel like you want a facade of independence, a facade of industry consensus, behind which you can be the puppet masters. It feels to me like this is more a game of obstructionism than a real technical discussion, and I don't think there's any point in me or anyone else fighting that. Your reasons, that there were millions of containers using the standard, were true before the standard even existed, before the name OCI was ever dreamed up, and the only reason that's true is because this IS the Docker standard. Nothing more. You're making me feel that, in reality, this is Docker's project, and the rest of us should either accept that fact, or pack up and go home.

On 08/31/2017 03:22 AM, Stephen Day wrote:

What problem are we trying to solve here? There is already millions of container images using this approach. Tweaking this won't actually make the format better; it just means that complaint implementations now need to handle two variations, instead of one.

If we are going to make an incompatible format, it would be better to move away from a layered diff approach to something tree oriented (like https://github.com/containerd/continuity).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/opencontainers/image-spec/issues/24#issuecomment-326162489, or mute the thread https://github.com/notifications/unsubscribe-auth/ABU7-OEws7wr63vhhFru9hdqppyfHuZwks5sdgrNgaJpZM4IHyeg.

wking commented 7 years ago

On Thu, Aug 31, 2017 at 10:49:39AM -0700, Timothy Hobbs wrote:

I am, however, now against any changes, we should never break a standard after 1.0. And 1.0 was a long time ago. Docker's 1.0 that is.

I don't think anyone is arguing for removing application/vnd.oci.image.layer.v1.tar and friends 1 which will continue to use the current .wh. approach 2. This is about specifying a new format that doesn't suffer from the .wh. limitations [3,4].

And to @stevvooe's most recent point 5, I would be very happy with a Merkle tree that had (sub)file granularity. #577 is probably a more targetted issue for that. And however that happens, it's going to be a pretty large shift.

Using any of 6 (SCHILY.filetype set to ‘whiteout’ or SCHILY.xattr.trusted.oci.whiteout set to ‘y’ in an application/vnd.oci.image.layer.v2.tar family of media types) would be a much smaller shift that is much easier to support. If we expect

577 to be addressed in the near future, then there's probably no

point in a v2.tar family. But if we expect #577 to linger for years, then I think it's worth creating a v2.tar family to address the v1.tar limitations.

caniszczyk commented 7 years ago

@timthelion please keep things civil, we have a code of conduct that we expect our community to abide by: https://www.opencontainers.org/about/code-of-conduct

vbatts commented 5 years ago

This came up again recently at Container Camp UK in London. @cyphar and I were discussing what other options would be a migration path from the AUFS inherited approach. And as we also want to enable ourselves for a non-TAR future.