Open matthewfeickert opened 3 years ago
Hm. Seems like --numeric-owner
is not sufficient here
$ docker run --rm -ti --entrypoint /bin/bash pyhf/pyhf:latest
# pyhf contrib download --compress https://doi.org/10.17182/hepdata.90607.v3/r3 /tmp/1Lbb-likelihoods.tar.gz
# tar tvfz /tmp/1Lbb-likelihoods.tar.gz
-rw-r--r-- krzysiek/krzysiek 4436904 2020-05-07 01:58 BkgOnly.json
-rw-r--r-- krzysiek/krzysiek 57332112 2020-05-31 14:07 patchset.json
-rw-r--r-- krzysiek/krzysiek 1378 2020-05-30 12:19 README.md
# mkdir -p /tmp/1Lbb-likelihoods && tar -xzvf /tmp/1Lbb-likelihoods.tar.gz -C /tmp/1Lbb-likelihoods --numeric-owner
BkgOnly.json
patchset.json
README.md
# ls -lR /tmp/1Lbb-likelihoods
/tmp/1Lbb-likelihoods:
total 60332
-rw-r--r-- 1 1000 1000 4436904 May 7 2020 BkgOnly.json
-rw-r--r-- 1 1000 1000 1378 May 30 2020 README.md
-rw-r--r-- 1 1000 1000 57332112 May 31 2020 patchset.json
# rm -rf /tmp/1Lbb-likelihoods
# mkdir -p /tmp/1Lbb-likelihoods && tar -xzvf /tmp/1Lbb-likelihoods.tar.gz -C /tmp/1Lbb-likelihoods
BkgOnly.json
patchset.json
README.md
# ls -lR /tmp/1Lbb-likelihoods
/tmp/1Lbb-likelihoods:
total 60332
-rw-r--r-- 1 1000 1000 4436904 May 7 2020 BkgOnly.json
-rw-r--r-- 1 1000 1000 1378 May 30 2020 README.md
-rw-r--r-- 1 1000 1000 57332112 May 31 2020 patchset.json
It seems that extracting the permissions is quite a bit more difficult. Is this actually strictly necessary? Or can we make the user responsible for taking care of this themselves?
One would need an equivalent of tar --no-same-owner
CLI option, which should do the job, as the tarball would be extracted under the current user identity.
One would need an equivalent of
tar --no-same-owner
CLI option, which should do the job, as the tarball would be extracted under the current user identity.
This is what I thought as well, but was surprised when even this failed in the container
$ docker run --rm -ti --entrypoint /bin/bash pyhf/pyhf:latest
root@381dcacf4b4b:/# pyhf contrib download --compress https://doi.org/10.17182/hepdata.90607.v3/r3 /tmp/1Lbb-likelihoods.tar.gz
root@381dcacf4b4b:/# tar tvfz /tmp/1Lbb-likelihoods.tar.gz
-rw-r--r-- krzysiek/krzysiek 4436904 2020-05-07 01:58 BkgOnly.json
-rw-r--r-- krzysiek/krzysiek 57332112 2020-05-31 14:07 patchset.json
-rw-r--r-- krzysiek/krzysiek 1378 2020-05-30 12:19 README.md
root@381dcacf4b4b:/# mkdir -p /tmp/1Lbb-likelihoods && tar -xzvf /tmp/1Lbb-likelihoods.tar.gz -C /tmp/1Lbb-likelihoods --no-same-owner
BkgOnly.json
patchset.json
README.md
root@381dcacf4b4b:/# ls -lR /tmp/1Lbb-likelihoods
/tmp/1Lbb-likelihoods:
total 60332
-rw-r--r-- 1 root root 4436904 May 7 2020 BkgOnly.json
-rw-r--r-- 1 root root 1378 May 30 2020 README.md
-rw-r--r-- 1 root root 57332112 May 31 2020 patchset.json
and locally
$ pyhf contrib download --compress https://doi.org/10.17182/hepdata.90607.v3/r3 /tmp/1Lbb-likelihoods.tar.gz
$ mkdir -p /tmp/1Lbb-likelihoods && tar -xzvf /tmp/1Lbb-likelihoods.tar.gz -C /tmp/1Lbb-likelihoods --no-same-owner
BkgOnly.json
patchset.json
README.md
$ ls -lR /tmp/1Lbb-likelihoods
/tmp/1Lbb-likelihoods:
total 60332
-rw-r--r-- 1 feickert feickert 4436904 May 6 2020 BkgOnly.json
-rw-r--r-- 1 feickert feickert 57332112 May 31 2020 patchset.json
-rw-r--r-- 1 feickert feickert 1378 May 30 2020 README.md
@tiborsimko do you have an example of this working on other images that I can replicate for testing?
doesn't that show the desired behavior?
in pyhf/pyhf:latest
everythin ends being beinig owned by root
while locally things end up benig owned by feickert
- each time by the person running the ocmmand
doesn't that show the desired behavior?
in
pyhf/pyhf:latest
everything ends being beinig owned byroot
while locally things end up benig owned byfeickert
- each time by the person running the command
Ah, right, sorry wasn't looking at that closely enough. So the only important part of the original user complaint was that things were being unpacked as 1000
and not root
?
I think in that case it was uid 500 but filels were unpacked as 1000
(likely b/c krzysiek
was 1000)
@lukasheinrich I'll follow up with you offline in case I'm missing the obvious, but this seems like something that pyhf
can't solve though, right? Now that I re-read Tibor's comment on Mattermost
I'm not sure about that, since UID and GID in the tarball might be precisely
1000
/1000
... So if you want to use current user withUID=500
, say, then that option wouldn't help. One would have to use an equivalent of--no-same-owner
or else introduce owner mapping to transform original UIDs to wanted UIDs.
It seems that this should be already addresses by pyhf
's behavior
$ cat /tmp/Dockerfile
FROM pyhf/pyhf:latest
RUN groupadd -g 500 testuser && \
useradd -m -u 500 -g 500 -s /bin/bash testuser
USER testuser
$ docker build . -f /tmp/Dockerfile -t test-example:debug
$ docker run --rm -ti --entrypoint /bin/bash test-example:debug
testuser@e52ff4d164db:/$ id
uid=500(testuser) gid=500(testuser) groups=500(testuser)
testuser@e52ff4d164db:/$ pyhf contrib download https://doi.org/10.17182/hepdata.90607.v3/r3 /tmp/1Lbb-likelihoods
testuser@e52ff4d164db:/$ ls -lR /tmp/1Lbb-likelihoods/
/tmp/1Lbb-likelihoods/:
total 60332
-rw-r--r-- 1 testuser testuser 4436904 May 7 2020 BkgOnly.json
-rw-r--r-- 1 testuser testuser 1378 May 30 2020 README.md
-rw-r--r-- 1 testuser testuser 57332112 May 31 2020 patchset.json
Am I missing something?
Description
@lukasheinrich and @tiborsimko noticed today that
pyhf.contrib.download
forces user identities when it unpacks downloaded tarballsWhat is actually happening is
tarfile.extractall()
inhttps://github.com/scikit-hep/pyhf/blob/66c814a11c4bfc39414163b826804be01158766f/src/pyhf/contrib/utils.py#L61-L64
is keeping the original file ownership information. @tiborsimko has a nice demo of this by just
wget
-ing the files manuallyAs @lukasheinrich has pointed out, the docs for
indicate that
So we probably just need
numeric_owner=True
by default and make it configurable with a kwarg.Expected Behavior
From @tiborsimko's report, it sounds like the desired behavior is to preserve the ownership of the files when unpacking them so that
would show
Actual Behavior
When unpacking the current user gains ownership for the directory
Checklist
git fetch
to get the most up to date version ofmaster