Open bronger opened 9 years ago
See https://docs.docker.com/reference/builder/#dockerignore-file You can add entries to a .dockerignore file in the root of the project.
.dockerignore does not solve this issue. As I wrote, "the other part is subject to another COPY".
So you want to conditionally copy based on some other copy?
The context contains a lot of directories A1...A10 and a directory B. A1...A10 have one destination, B has another:
COPY A1 /some/where/A1/
COPY A2 /some/where/A2/
...
COPY A10 /some/where/A10/
COPY B some/where/else/B/
And this is awkward.
What part of it is awkward? Listing them all individually?
COPY A* /some/where/
COPY B /some/where/else/
Does this work?
The names A1..A10, B were fake. Besides, COPY A* ...
throws together the contents of the directories.
There are a couple of options I admit, but I think that all of them are awkward. I mentioned three in my original posting. A fourth option is to rearrange my source code permanently so that A1..A10 are moved in a new directory A. I was hoping that this was not necessary because an additional nesting level is not something to wish for, and my current tools needed to special-case my dockerised projects then.
(BTW, #6094 (following symlinks) would help in this case. But apparently, this is no option either.)
@bronger if COPY
behaved exactly like cp
, would that solve your use-case?
I'm not sure I 100% understand. Maybe @duglin can have a look.
@bronger I think @cpuguy83 asked the right question, how would you solve this if you were using 'cp' ? I looked and didn't notice some kind of excludes option on 'cp' so I'm not sure how you would solve this outside of a 'docker build' either.
With cp behaviour, I could ameliorate the situation by saying
COPY ["A1", ... "A10", "/some/where/"]
It's still a mild maintenance problem because I would have to think of that line if I added an "A11" directory. But that would be acceptable.
Besides, cp does not need excludes, because copying everything and removing the unwanted parts has almost no performance impact beyond the copying itself. With docker's COPY, it means wrongly invalidated cache every time B is changed, and bigger images.
@bronger you can do:
COPY a b c d /some/where
just like you were suggesting.
As for doing a RUN rm ...
after the COPY ...
, yes you'll have on extra layer, but you still should be able to use the cache. If you see a cache miss due to it let me know, I don't think you should.
But
COPY a b c d /some/where/
copies the contents of the directories a b c d together, instead of creating the directories /some/where/{a,b,c,d}. It works like rsync with a slash appended to the src directory. Therefore, the four instructions
COPY a /some/where/a/
COPY b /some/where/b/
COPY c /some/where/c/
COPY d /some/where/d/
are needed.
As for the cache ... if I say
COPY . /some/where/
RUN rm -Rf /some/where/e
then the cache is not used if e changes, although e is not effectively included into the operation.
@bronger yep, sadly you're correct. I guess we could add a --exclude zzz
type of flag, but per https://github.com/docker/docker/blob/master/ROADMAP.md#22-dockerfile-syntax it may not get a lot of traction right now.
Fair enough. Then I will use a COPY+rm for the time being and add a FixMe comment. Thank you for your time!
Just to :+1: this issue. I regularly regret that COPY doesn't mirror rsync's trailing slash semantics. It means you can't COPY multiple directories in a single statement, leading to layer proliferation.
I regularly encounter a case where I want to copy many directories except for one (which will be copied later, because I want it to have different layer-invalidation effects), so --exclude
would be useful, as well.
Also, from man rsync
:
A trailing slash on the source changes this behavior to avoid creating
an additional directory level at the destination. You can think of a
trailing / on a source as meaning "copy the contents of this directory"
as opposed to "copy the directory by name", but in both cases the
attributes of the containing directory are transferred to the containโ
ing directory on the destination. In other words, each of the followโ
ing commands copies the files in the same way, including their setting
of the attributes of /dest/foo:
rsync -av /src/foo /dest
rsync -av /src/foo/ /dest/foo
I guess it can't be changed now without breaking a lot of wild Dockerfile
s.
As a concrete example, let's say I have a directory looking like this:
/vendor
/part1
/part2
/part3
/...
/partN
I want something that looks like:
COPY /vendor /docker/vendor
RUN /vendor/build
COPY /part1 /part2 ... /partN /docker/ # copy directories part1-N to /docker/part{1..N}/
RUN /docker/build1-N.sh
So that part1-N doesn't invalidate building of /vendor
. (since /vendor is rarely updated compared to part1-N).
I have previously worked around this by putting part1-N in their own directory, so:
/vendor
/src/part1-N
But I have also encountered this problem in projects that I am not at liberty to rearrange quite so easily.
@praller good example, we're facing the exact same issue. The main problem is that Go's filepath.Match doesn't allow much creativity compared to regular expressions (i.e. no anti pattern)
I just came up with a somewhat crack-brained workaround for this. COPY can't exclude directories, but ADD can expand tgz.
It's one extra build step: tar --exclude='./deferred_copy' -czf all_but_deferred.tgz . docker build ...
Then in your Dockerfile: ADD ./all_but_deferred.tgz /application_dir/ .. stuff in the rarely changing layers .. ADD . /application_dir/ .. stuff in the often changing layers
That gives the full syntax of tar for including/excluding/whatever without gobs of wasted layers trying to include/exclude.
@jason-kane This is nice trick, thanks for sharing. One small point: it looks like you can't add the z
(gzip) flag to tar
โit changes the sha256 checksum value, which invalidates the Docker cache. Otherwise this approach works great for me.
+1 for this issue, I think it could be supported in the same way a lot of glob libraries support it:
Here's a proposal to copy everything except node_modules
COPY . /app -node_modules/
I come across the same problem as well, and it's kind of painful for me when my Java webapps is about 900MB but almost 80% of that is rarely changed. It's an early state of my application and the folder structure is somewhat stable so I don't mind adding 6-7 COPY layer to be able to use the cache, but it will surely hurt in the long term when more and more files and directories are added
๐
I have the same problem although with docker cp
, I want to copy all files from a folder except for one
Exact same issue here. I want to copy a git repo and exclude the .git directory.
@oaxlin you could use the .dockerignore file for that.
@antoineco are you sure that will work? It's been a while since I tried but I'm pretty sure .dockerignore
didn't work with docker cp
, at least at the time
@kkozmic-seek absolutely sure :) But the docker cp
CLI subcommand you mentioned is different from the COPY
statement found in the Dockerfile, which is the scope this issue.
docker cp
has indeed nothing to do with Dockerfile and . dockerignore, but on the other hand it's not used for building images.
Would really like this as well - to speed up build I could copy some folder in earlier parts of the build and then cache would help me out ...
I'm not sure I understand what the use case is but wouldn't just touching the files to exclude before COPY solve the problem?
RUN touch /app/node_modules
COPY . /app
RUN rm /app/node_modules
AFAIK COPY
doesn't overwrite file which is why I think this might work.
Oops, never mind that, looks like COPY
actually overwrites files. I'm now a bit puzzled by https://nodejs.org/en/docs/guides/nodejs-docker-webapp/ which npm installs and then does a COPY . /usr/src/app
. I guess it assumes that node_modules
is docker ignored? On the other hand, having a COPY_NO_OVERWRITE
(better name needed) command could be one way to achieve ignoring files during copy (you'd have to create empty files/dirs for stuff you want to ignore).
FWIW, I find this very ugly.
I found another hack solution:
Example project structure: app/ config/ script/ spec/ static/ ...
We want:
Hack solution:
ADD ./static /home/app
ADD ["./[^s^a]*", "./s[^t]*", "/home/app/"]
ADD ./app /home/app
Second ADD is equivalent of: copy all, exept "./st" and "./a". Any ideas for improvements?
Which is the status of comment?
๐
๐
๐
๐
what about having a .dockerignore file in the same fashion than .gitignore?
@mirestrepo See the first two follow-ups to this issue.
Currently this is a mega perf nerf for C# / dotnet development.
What i want:
Now it seems this is not (easily) possible because i cannot copy everything except.
So either dlls are copied double Which increases the docker file size or everything is copied in one layer. The later being a mega nerf because external dlls are copied everytime instead of cached.
@adresdvila thanks for the solutoin i was able to split it up in:
COPY ["[^M][^y]*","/app/"]
COPY ./My* /app/
Although this still leave the problem that .json files are copied at the first command
Just chiming in to say thanks to @antoineco my problem is solved. I no longer copy the .git directory into my docker images.
This dramatically improved the image size, and makes my image much more friendly to the docker caching system.
I have the same problem. I have a big file which I want to copy before the rest of files so any change in the context does not repeat it as it takes a lot of time to copy (7 GB bin file). Are there any new workarounds?
The issue with COPY and prune approach is that the layer before pruning still continue to have all the data in.
COPY . --exclude=a --exclude=b
would be extremely useful. What do you think, @cpuguy83?
@Nowaker I like it. Seems in line with tar
and rsync
anyway.
I guess this should support the same format as dockerignore?
@tonistiigi @dnephin
This case would be handled by #32507 I think.
@cpuguy83 Yeah. Most notably, in line with COPY --chown=uid:gid
@dnephin RUN --mount
sounds like a totally different use case, centered around generating something based on data we don't need after the output has been generated. (E.g. compiling with Go, generating HTMLs from Markdown file, etc). RUN --mount
is dope and I'd definitely use it in the project I'm currently working on (generating API docs using Sphinx).
COPY somedir --exclude=excludeddir1 --exclude=excludeddir2
is centered around copying data that has to end up in the image but splattered across multiple COPY statements, not just one. The goal is to avoid explicit COPY first second third .... eleventh destination/
when project has a lot of directories in root and it's subject to change/increase.
In my very case, I want to copy most of the files except those that are non-essential first to make sure cache is used if source files didn't change. Then, compile/generate - and use cache if the copied files didn't change (yay). At the very beginning copy the files I excluded previously which might have changed since the previous build but their change doesn't affect the compile/generate. Obviously, I have a ton files and directories in . that I want to COPY first, and only a couple that I want to COPY somewhere at the end.
The idea is that RUN --mount
is able to solve a lot of problems. COPY --exclude
solves only a single problem.
I'd rather add something that solves a lot problems than add a bunch of syntax to solve individual problems. You would use RUN --mount... rsync --exclude ...
(or some script that copies individual things) and it would be the equivalent to COPY --exclude
.
@dnephin Oh, I didn't think of RUN --mount rsync
! Excellent! ๐
That's excellent indeed. However you won't be able to leverage caching efficiently @Nowaker, because the cache will be invalidated if anything changes in the mounted directory, not only what you want to rsync.
If you use the output of that rsync as an input for something else and no files actually changed in there the cache will pick up again. If you are really up for it you can do this currently with something like https://gist.github.com/tonistiigi/38ead7a4ed60565996d207a7d589d9c4#file-gistfile1-txt-L130-L140 . Only change in RUN --mount
(or LLB in buildkit) is that you don't have to actually copy files between stages but can access them directly so it is much faster.
I need to COPY a part of a context directory to the container (the other part is subject to another COPY). Unfortunately, the current possibilities for this are suboptimal: