Closed vojeroen closed 3 years ago
Hi @vojeroen,
I'm not fully sure I understand all of this yet: It sounds to be about permissions at heart. Before I go on, let me note that:
If there is a problem left to solve, please help me understand what the missing bits are.
Best, Sebastian
PS: Using a single Git clone plus remembering the previous HEAD
plus (simplified) inspecting git diff --stat <OLD_HEAD_SHA1> HEAD
would also work for an gentoo-tree-diff
implemenation, but it was way simpler to implement with two directories and that approach leaves the door open to work with non-git sources like emerge-webrsync
and tarball downloads.
Indeed, the problem is mostly about permissions, but was caused by the use of gentoo-tree-sync
. Let me elaborate shortly: I have a setup with a "portage" user that builds the packages. So the portage user would, amongst others, execute gentoo-tree-sync portdir
to get the portage tree. However, due to the fact that this runs in a container, permissions on the files inside portdir
are set to 644, and ownership is set to root. This means that the portage user cannot do an rm -rf portdir
anymore. That is mainly annoying and would need some workaround by the user to handle these permissions (getting sudo rights or running the rm
inside a container too).
Anyway, if the command is dropped, this problem does not exist anymore.
One side note though. You are very familiar with portage, so it is obvious for you how to handle this all. If someone is less familiar with the inner workings of portage, and basically just uses emerge --sync
to get the portage tree, it is less obvious how to get the portage tree on a non-Gentoo system (or just a separate copy on a Gentoo system). I know Gentoo is all about managing your system yourself, but still I believe it would be nice to include some pointers in the documentation on how this can be done, or point to the relevant Gentoo documentation on how to sync with portage (especially from a non-Gentoo system, where plain rsync or git will be needed).
@vojeroen I'm still recovering from my second vaccine two days ago: That's why I'm only starting to reply again now…
Good to hear that it was indeed about permissions at the core.
I can definitely add a few words to the readme on that topic. I'm aware of 5 methods of obtaining a portage tree with metadata cache (sometimes called md5 cache):
emerge --sync
(uses rsync) in a containeremerge-webrsync
(uses HTTP) in a containerI'm using the last option (e) as of today, because it's quick to update and has no lag behind its source that I would notice. What do you think about option ~(3)~ (e)? Article Portage Security could be of interest.
I noticed the other day that "git pull" would also be an option besides rsync, even without going to the internet if portdir-old
is a local clone of portdir-new
.
Wasn't the option 3 (emerge webrsync) exactly what you did with gentoo-tree-sync
? So I'm not sure what you mean now?
My bad, typo, "3" was supposed to be "e". I moved from (c) to (e), yes. What do you think about option (e) for your own use?
I'll have a look, but I have to say I am inclined to make a simple up-to-date container image with gentoo-build --tag-docker-image XXX --update @world
that does an emerge --sync
. In fact, similar to what you did with gentoo-tree-sync
; I liked the approach. Primarily, this approach deals with all the security/signature stuff mentioned on the Portage Security page, so I'm always sure I only get official ebuilds.
@vojeroen that will get back the permission problems you mentioned above — is that acceptable? Have you checked if emerge --sync
is as up to date as Git? I can restore gentoo-tree-sync
and also add a switch for plain emerge --sync
rather than emerge-webrsync
but we should be sure that it solves an actual problem and fits your operational requirements, e.g. with regard to file permissions.
Solving the permissions issue I had is quite easy if the container moves the new portdir to the old portdir prior to sync'ing. Something like this is very simple and does the trick for me:
Dockerfile to create the image gentoo-sync
:
FROM binary-gentoo:latest
CMD ["bash", "-c", "set -x && rsync -Aaxv /var/db/repos/gentoo-old/ /var/db/repos/gentoo/ && emerge --sync"]
It assumes that the image binary-gentoo
contains both rsync and emerge, of course. But that can be created with gentoo-build --tag-docker-image
.
And then run it:
docker run --rm -v /path/to/old/portdir:/var/db/repos/gentoo-old -v /path/to/new/portdir:/var/db/repos/gentoo gentoo-sync
I'll leave it up to you to decide if you want to restore gentoo-tree-sync
. The solution I outlined here is almost as simple as a git pull so I'm quite happy with it.
What I see above is this three-step template:
copy <old> <new>
sync <new>
diff <old> <new>
I think the rsync call above is copying in the wrong direction: no one ever writes to <old>
in this approach. Flipping the copy call around seems to make things work. Does that make sense?
My local approach looks like this:
sync <new>
diff <old> <new>
copy <new> <old>
Once we start looping these steps, this is same as yours (with copy direction flipped as mentioned above).
I think it's fair to say that technically binary-gentoo tries to do two things:
With regard to rsync and copying files around, that's something that a non-Gentoo host OS does offer and also something where integration can be self-built in a few minutes, as you demonstrated above. So to me that means that gentoo-tree-sync
should not copy things (because little value) but provide means of syncing portdir that the host OS cannot offer (i.e. good value).
So the key question I think is whether you as a user would rather:
gentoo-tree-sync
and copy files some other way (e.g. sudo rsync or the way you describe above)emerge --sync
and copying files your own way.If (a) I'd restore gentoo-tree-sync
and if (b) we'd wait for the next person that would rather not feed portdir through Git.
Does that sound like a plan? Which do you prefer?
I thought about this from different angles more. Maybe you're right, maybe gentoo-tree-sync [--backup-to <old>] <new>
is simple and useful. Let me demo a PR on that.
I think the rsync call above is copying in the wrong direction: no one ever writes to
in this approach. Flipping the copy call around seems to make things work. Does that make sense?
You are entirely correct, that was of course a mistake.
With regard to rsync and copying files around, that's something that a non-Gentoo host OS does offer and also something where integration can be self-built in a few minutes, as you demonstrated above. So to me that means that gentoo-tree-sync should not copy things (because little value) but provide means of syncing portdir that the host OS cannot offer (i.e. good value).
You already mentioned this in the #31, but a benefit of using emerge --sync
or emerge-webrsync
is that you get additional security features out of the box (if I understood the portage manual correctly), which you don't have with a plain rsync or git. So for non-Gentoo platforms, that is definitely a plus.
I thought about this from different angles more. Maybe you're right, maybe gentoo-tree-sync [--backup-to
] is simple and useful. Let me demo a PR on that.
As mentioned in the PR, I'll be happy to use gentoo-tree-sync
if you decide to include it again.
Feel free to close this issue if you have no further remarks on it.
Requirement In order for
gentoo-tree-diff
to work, you need an old and a new portdir. The obvious way to handle this, is to do something like this:The problem with this way, is that the first command (
rm -rf portdir-old
) will fail because the permissions of the files in that directory are set in such a way that the user running this script, cannot delete them. In fact, the content ofportdir
is owned by root (as set by the container that is executed by thegentoo-tree-sync
command).As I don't want the "portage user" on my machine to have sudo rights, there is no way for that user to delete the old portdir. It would be interesting if
binary-gentoo
offers a nice way to handle this, because keeping an old and a new portdir is needed for it to work.Proposal for solution The maintenance of the old portdir could be taken care of by
gentoo-tree-sync
. Something likegentoo-tree-sync --old-portdir portdir-old portdir
, which would then do the following:--old-portdir
portdir
to the directory specified by--old-portdir
portdir
(existing functionality)If you have other ideas, let me know. If you agree that this should be solved by
binary-gentoo
and when agreed upon a solution, I can see if I can implement this.