pkgcore / pkgdev

collection of tools for Gentoo development
https://pkgcore.github.io/pkgdev/
BSD 3-Clause "New" or "Revised" License
30 stars 13 forks source link

Pkgdev relies on a pkgcore bug for pkgdev_commit #172

Open ferringb opened 10 months ago

ferringb commented 10 months ago

See https://github.com/pkgcore/pkgdev/blob/992da6c9f5a9e943aebd894cac2935f1993f3d0f/src/pkgdev/scripts/pkgdev_commit.py#L258 . That's actually an internal API in pkgcore despite the name.

Roughly, for 'mutable' repositories in pkgcore, this is the backchannel from a repository operation operation that notifies the repo instance to change it's cache. It's a bad design, but it's what I shat out back then. This is intended just for repositories that are mutable- ebuild.repository.UnconfigureTree is immutable. That pathway should not be used. If in doubt, note that https://github.com/ferringb/pkgcore/actions/runs/7677241057/job/20925732868 demonstrates that failure. The commit that triggered it is a simple repository cache optimization that is explicitly in line w/ how the cache internals are supposed to work. I can walk folks through this explanation, but I'd suggest you just take my word since I wrote that and the backchannel (again, which was a hack).

So, how do we resolve this.

I'm unclear on exactly how HistoricalRepo is used; in looking at it, it looks like this is a repository instance that you're mutating under it's feet and then trying to extract info out of (presumably parsed metadata). If so, I see 2 options:

  1. Implement the equivalent of the portage 'volatile repository'; basically a repo that has all caching disabled. This will likely be implemented for test reasons, but that version is just package level- a proper volatile is [repo_config, packages, profiles] and whatever else I'm forgetting/
  2. Add a requirement that repositories implement a clear_caches() method. In your case- mutating the repo at the FS layer- it's better to drop all caches (repo_config, profiles. packages) and force a re-init of the objects. This is safer/saner and should be easier to do.

I'll add a clear_caches() one way or another- pkgcore needs that to properly handle syncing w/out jumping through hoops. This is my strong suggestion.

Longer term, I did intend the add_package type api's to allow commiting an ebuild back into VCS, including the cache. I have no clue how far I got in that, but the API was intended to allow for this, and I suspect that would be useful for pkgdev. Any requirements needed you can enumerate would be appreciated.

In the meantime, please do not use that set of APIs; it's working by chance, not by design or architectural constraints.

ferringb commented 10 months ago

Some additional context here: the add/notify pathways were solely intended to be used by a repo operation that would be responsible for manipulating the backing store, including the transaction to stage and then land the content. For VDB this is obvious in flow- pkg_preinst, mutate the vdb (more specifically the domain that the VDB tracks), then pkg_postinst.

That design was architected to allow for remote manipulation of an ebuild repository that isn't stored in FS layout that y'all all deal in; think about a repository locked to a specific sha1 in a GH repo and you've got the right idea. If you trace the rest of pkgcore, I abstracted things so access to stuff like eclass or distfiles or ebuilds allowed pkgcore to step in and drop content into place (inherit implementation is a good example- it can stream the eclass to EBD). From what I can see of EAPI8 it looks like what I was building towards is closer, but again, how HistoricalRepo is using things isn't inline with what the architecture actually requires.

ferringb commented 10 months ago

Final comment here- how pkgdev uses notify_add_package would require EbuildRepository to rewrite profiles/categories to ensure that the packages category is in there. That's likely beyond what you wanted this to do, which is partially how I found this misuse ;)