hartwork / binary-gentoo

:cow: Collection of simple CLI tools to help build Gentoo packages on a non-Gentoo Linux host
https://pypi.org/project/binary-gentoo/
GNU Affero General Public License v3.0
17 stars 1 forks source link

[gentoo-tree-sync] Idea: Also handle the "old portdir" sync #24

Closed vojeroen closed 3 years ago

vojeroen commented 3 years ago

Requirement In order for gentoo-tree-diff to work, you need an old and a new portdir. The obvious way to handle this, is to do something like this:

rm -rf portdir-old
mv portdir portdir-old
mkdir portdir
gentoo-tree-sync portdir
gentoo-tree-diff portdir-old portdir

The problem with this way, is that the first command (rm -rf portdir-old) will fail because the permissions of the files in that directory are set in such a way that the user running this script, cannot delete them. In fact, the content of portdir is owned by root (as set by the container that is executed by the gentoo-tree-sync command).

As I don't want the "portage user" on my machine to have sudo rights, there is no way for that user to delete the old portdir. It would be interesting if binary-gentoo offers a nice way to handle this, because keeping an old and a new portdir is needed for it to work.

Proposal for solution The maintenance of the old portdir could be taken care of by gentoo-tree-sync. Something like gentoo-tree-sync --old-portdir portdir-old portdir, which would then do the following:

If you have other ideas, let me know. If you agree that this should be solved by binary-gentoo and when agreed upon a solution, I can see if I can implement this.

hartwork commented 3 years ago

Hi @vojeroen,

I'm not fully sure I understand all of this yet: It sounds to be about permissions at heart. Before I go on, let me note that:

If there is a problem left to solve, please help me understand what the missing bits are.

Best, Sebastian

hartwork commented 3 years ago

PS: Using a single Git clone plus remembering the previous HEAD plus (simplified) inspecting git diff --stat <OLD_HEAD_SHA1> HEAD would also work for an gentoo-tree-diff implemenation, but it was way simpler to implement with two directories and that approach leaves the door open to work with non-git sources like emerge-webrsync and tarball downloads.

vojeroen commented 3 years ago

Indeed, the problem is mostly about permissions, but was caused by the use of gentoo-tree-sync. Let me elaborate shortly: I have a setup with a "portage" user that builds the packages. So the portage user would, amongst others, execute gentoo-tree-sync portdir to get the portage tree. However, due to the fact that this runs in a container, permissions on the files inside portdir are set to 644, and ownership is set to root. This means that the portage user cannot do an rm -rf portdir anymore. That is mainly annoying and would need some workaround by the user to handle these permissions (getting sudo rights or running the rm inside a container too).

Anyway, if the command is dropped, this problem does not exist anymore.

One side note though. You are very familiar with portage, so it is obvious for you how to handle this all. If someone is less familiar with the inner workings of portage, and basically just uses emerge --sync to get the portage tree, it is less obvious how to get the portage tree on a non-Gentoo system (or just a separate copy on a Gentoo system). I know Gentoo is all about managing your system yourself, but still I believe it would be nice to include some pointers in the documentation on how this can be done, or point to the relevant Gentoo documentation on how to sync with portage (especially from a non-Gentoo system, where plain rsync or git will be needed).

hartwork commented 3 years ago

@vojeroen I'm still recovering from my second vaccine two days ago: That's why I'm only starting to reply again now…

Good to hear that it was indeed about permissions at the core.

I can definitely add a few words to the readme on that topic. I'm aware of 5 methods of obtaining a portage tree with metadata cache (sometimes called md5 cache):

I'm using the last option (e) as of today, because it's quick to update and has no lag behind its source that I would notice. What do you think about option ~(3)~ (e)? Article Portage Security could be of interest.

I noticed the other day that "git pull" would also be an option besides rsync, even without going to the internet if portdir-old is a local clone of portdir-new.

vojeroen commented 3 years ago

Wasn't the option 3 (emerge webrsync) exactly what you did with gentoo-tree-sync ? So I'm not sure what you mean now?

hartwork commented 3 years ago

My bad, typo, "3" was supposed to be "e". I moved from (c) to (e), yes. What do you think about option (e) for your own use?

vojeroen commented 3 years ago

I'll have a look, but I have to say I am inclined to make a simple up-to-date container image with gentoo-build --tag-docker-image XXX --update @world that does an emerge --sync. In fact, similar to what you did with gentoo-tree-sync; I liked the approach. Primarily, this approach deals with all the security/signature stuff mentioned on the Portage Security page, so I'm always sure I only get official ebuilds.

hartwork commented 3 years ago

@vojeroen that will get back the permission problems you mentioned above — is that acceptable? Have you checked if emerge --sync is as up to date as Git? I can restore gentoo-tree-sync and also add a switch for plain emerge --sync rather than emerge-webrsync but we should be sure that it solves an actual problem and fits your operational requirements, e.g. with regard to file permissions.

vojeroen commented 3 years ago

Solving the permissions issue I had is quite easy if the container moves the new portdir to the old portdir prior to sync'ing. Something like this is very simple and does the trick for me:

Dockerfile to create the image gentoo-sync:

FROM binary-gentoo:latest
CMD ["bash", "-c", "set -x && rsync -Aaxv /var/db/repos/gentoo-old/ /var/db/repos/gentoo/ && emerge --sync"]

It assumes that the image binary-gentoo contains both rsync and emerge, of course. But that can be created with gentoo-build --tag-docker-image. And then run it:

docker run --rm -v /path/to/old/portdir:/var/db/repos/gentoo-old -v /path/to/new/portdir:/var/db/repos/gentoo gentoo-sync

I'll leave it up to you to decide if you want to restore gentoo-tree-sync. The solution I outlined here is almost as simple as a git pull so I'm quite happy with it.

hartwork commented 3 years ago

What I see above is this three-step template:

I think the rsync call above is copying in the wrong direction: no one ever writes to <old> in this approach. Flipping the copy call around seems to make things work. Does that make sense?

My local approach looks like this:

Once we start looping these steps, this is same as yours (with copy direction flipped as mentioned above).


I think it's fair to say that technically binary-gentoo tries to do two things:

With regard to rsync and copying files around, that's something that a non-Gentoo host OS does offer and also something where integration can be self-built in a few minutes, as you demonstrated above. So to me that means that gentoo-tree-sync should not copy things (because little value) but provide means of syncing portdir that the host OS cannot offer (i.e. good value).

So the key question I think is whether you as a user would rather:

If (a) I'd restore gentoo-tree-sync and if (b) we'd wait for the next person that would rather not feed portdir through Git. Does that sound like a plan? Which do you prefer?

hartwork commented 3 years ago

I thought about this from different angles more. Maybe you're right, maybe gentoo-tree-sync [--backup-to <old>] <new> is simple and useful. Let me demo a PR on that.

vojeroen commented 3 years ago

I think the rsync call above is copying in the wrong direction: no one ever writes to in this approach. Flipping the copy call around seems to make things work. Does that make sense?

You are entirely correct, that was of course a mistake.

With regard to rsync and copying files around, that's something that a non-Gentoo host OS does offer and also something where integration can be self-built in a few minutes, as you demonstrated above. So to me that means that gentoo-tree-sync should not copy things (because little value) but provide means of syncing portdir that the host OS cannot offer (i.e. good value).

You already mentioned this in the #31, but a benefit of using emerge --sync or emerge-webrsync is that you get additional security features out of the box (if I understood the portage manual correctly), which you don't have with a plain rsync or git. So for non-Gentoo platforms, that is definitely a plus.

I thought about this from different angles more. Maybe you're right, maybe gentoo-tree-sync [--backup-to ] is simple and useful. Let me demo a PR on that.

As mentioned in the PR, I'll be happy to use gentoo-tree-sync if you decide to include it again.

Feel free to close this issue if you have no further remarks on it.