snok / container-retention-policy

GitHub action for pruning old GHCR container image versions.
MIT License
178 stars 29 forks source link

Output doesn't give valid images deleted #78

Closed pkordes closed 2 months ago

pkordes commented 4 months ago

I am just starting to use this and deleting old branch builds. I am using filter-tags: br-*. It correctly found and deleted two images. But when it outputs the name of the image deleted, it has the correct image name and instead of the actual tag, it has the github ID?! So for instance, my image was named:

poetry_base:br-somebranch

It was deleted and gave me:

Deleted old image: poetry_base:197206811

I didn't understand what this was until I went to the tab for deleted images and clicked on the name of that image which brought me to this url:

https://github.com/my-org/my-project/pkgs/container/poetry_base/197206811

So it is showing some github ID as the tag which is not very useful for making sure the right thing was done?!

sondrelg commented 4 months ago

Ah, that makes sense. We need to use the image version id to delete an image version, and we've mistakenly used the id instead of the tag value when printing this output. I'll fix this when I get a chance πŸ‘ Thanks for the report!

rcdailey commented 3 months ago

Any fix for this yet? I'm trying to evaluate whether or not the deleted images are correct during a dry run and I can't validate the results without seeing proper tag names.

sondrelg commented 3 months ago

Working on a larger release that will fix this. Think it will be ready anywhere from in a few days to in a few weeks πŸ‘ In the meantime, I'd be happy to accept a contribution.

sondrelg commented 3 months ago

Hi @pkordes and @rcdailey, just wanted to give you a quick update:

I am close to finishing the v3-develop branch, and intend to release a new major version within a week or two. The branch fixes your issue, and the new output looks like this:

2024-05-28T13:30:21.335032Z  INFO container_retention_policy: Selected 2 tagged and 16 untagged package versions for deletion
2024-05-28T13:30:21.659586Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221935583
2024-05-28T13:30:21.683804Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221935417
2024-05-28T13:30:21.711312Z  INFO container_retention_policy::client: Deleted container-retention-policy:test-2-3 package_version_id=221935339
2024-05-28T13:30:21.711361Z  INFO container_retention_policy::client: Deleted container-retention-policy:test-2-2 package_version_id=221935339
2024-05-28T13:30:21.711365Z  INFO container_retention_policy::client: Deleted container-retention-policy:test-2-1 package_version_id=221935339
2024-05-28T13:30:21.711369Z  INFO container_retention_policy::client: Deleted container-retention-policy:test-2 package_version_id=221935339
2024-05-28T13:30:21.713884Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221927611
2024-05-28T13:30:21.714856Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221935763
2024-05-28T13:30:21.714956Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221935334
2024-05-28T13:30:21.717372Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221935412
2024-05-28T13:30:21.719187Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221918492
2024-05-28T13:30:21.724613Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221919161
2024-05-28T13:30:21.725512Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221935194
2024-05-28T13:30:21.725591Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221927608
2024-05-28T13:30:21.726282Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221935772
2024-05-28T13:30:21.727003Z  INFO container_retention_policy::client: Deleted container-retention-policy:test-3-3 package_version_id=221935421
2024-05-28T13:30:21.727014Z  INFO container_retention_policy::client: Deleted container-retention-policy:test-3-2 package_version_id=221935421
2024-05-28T13:30:21.727018Z  INFO container_retention_policy::client: Deleted container-retention-policy:test-3-1 package_version_id=221935421
2024-05-28T13:30:21.727022Z  INFO container_retention_policy::client: Deleted container-retention-policy:test-3 package_version_id=221935421
2024-05-28T13:30:21.727889Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221935336
2024-05-28T13:30:21.728281Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221935595
2024-05-28T13:30:21.733786Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221935188
2024-05-28T13:30:21.740847Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221919147
2024-05-28T13:30:21.752764Z  INFO container_retention_policy::client: Deleted container-retention-policy:<untagged> package_version_id=221927603

Does this seem good to you?

rcdailey commented 3 months ago

@sondrelg This is fantastic; great work. I would say for my needs, that's an excellent improvement!

sondrelg commented 3 months ago

For anyone else reading, a bit of context that might be worth mentioning is that when I run:

docker build -t foo . --push
docker build -t bar . --push
docker build -t baz . --push

and query the packages API for package versions, you'll get back:

[
  {
    "id": 214880827,
    "name": "sha256:e8530d7d4c44954276715032c027882a2569318bb7f79c5a4fce6c80c0c1018e",
    "created_at": "2024-05-11T12:42:55Z",
    "metadata": {
      "package_type": "container",
      "container": {
        "tags": [
          "foo",
          "bar"
          "baz"
        ]
      }
    }
  }
]

Where all the tags are attached to a single package version. In the output we therefore have to choose between outputting something like:

2024-05-28T13:30:21.727003Z  INFO container_retention_policy::client: Deleted container-retention-policy:test-3-3 package_version_id=221935421
2024-05-28T13:30:21.727014Z  INFO container_retention_policy::client: Deleted container-retention-policy:test-3-2 package_version_id=221935421
2024-05-28T13:30:21.727018Z  INFO container_retention_policy::client: Deleted container-retention-policy:test-3-1 package_version_id=221935421
2024-05-28T13:30:21.727022Z  INFO container_retention_policy::client: Deleted container-retention-policy:test-3 package_version_id=221935421

or

2024-05-28T13:30:21.727003Z  INFO container_retention_policy::client: Deleted package version 221935421 package_name=container-retention-policy tags=[test-3-3, test-3-2, test-3-1, test-3]

the latter would be more concise, but I'm not sure what's more intuitive. Happy to get feedback on what would be most useful πŸ™‚

rcdailey commented 3 months ago

I was originally hesitant to bring up this topic here, since it's important but unrelated to the issue of how the dry run logs appear -- however you brought up a choice about how to render the output, which might be influenced by this other point, so I'll go ahead and touch on it.

There's another issue here (I don't remember the number off hand) that covers the issue of valid uploaded tags being "destroyed" by the removal of images/layers. I think all of this has to do with buildx based on the comments there. In any case, I think this may come in as a decision point for your two options above because it might make sense to render the tags together when they all point to the same image because you'd also need to be aware of this to some extent (I assume) to determine which layers/images to remove without destroying other tags when buildx / multiple architectures are involved. Or put another way, that uploaded manifest file (which I'm mostly unfamiliar with) might be involved.

I don't know if that makes much sense but I thought I'd throw it out there in case it sparks a thought.

Again great work and I appreciate the turnaround on this.

sondrelg commented 3 months ago

The problem with multi-platform images actually has to do with the fact that several package versions comprise the parts of one multi-platform image manifest and platform versions of the image. I've implemented a working fix for this issue too, in the new version.

If you have time, perhaps you'd be willing to read over the section on multi-platform images in the new readme and see if it seems sensible and/or clear?

I think it would be hard, with the current implementation, to e.g., group related multi-platform package versions together in the dry-run output for now, but that might change in the future. There are quite a few issues related to expanding the packages API to contain metadata that would help with this hehe

sondrelg commented 2 months ago

The output of the dry-run and non-dry-run delete calls have been revised in the latest release πŸ™‡ Sorry for the delay!

The inputs have changed a bit in the latest release, so a migration guide for v3 is included in the release post πŸ‘ If you run into any issues, please share them in the issue opened for tracking the v3 release ☺️

rcdailey commented 2 months ago

@sondrelg,

I finally had the opportunity to upgrade my workflow to use the v3 release of your action. I have observed a lot of fantastic changes in this release, so I wanted to start by thanking you for your hard work and for bringing these updates to us.

I reviewed the README section regarding multi-platform images. In my opinion, this section could benefit from further elaboration as it didn't fully explain the docker manifest command. After some research, I discovered that the ghcr.io/package1 in your example informs the docker manifest inspect command to list the SHAs only for the latest tag. In my case, I needed it to list the SHAs for all release tags.

Here's the PowerShell script I devised to accomplish this:

$tags = skopeo --override-os linux list-tags docker://ghcr.io/recyclarr/recyclarr |
    convertfrom-json | select-object -expandproperty Tags | where-object { $_ -notmatch "dev" }

$shas = @()
foreach ($tag in $tags) {
    # "Obtain SHAs for tag $tag"
    $manifests = (docker manifest inspect ghcr.io/recyclarr/recyclarr:$tag | ConvertFrom-Json).manifests
    $manifests | ForEach-Object { $shas += $_.digest }
}

Write-Host $($shas | Select-Object -Unique)

My tags follow semantic versioning, so I have tags such as 6.0.2 and 2.0.0, as well as shorthand, mutable tags like 6.0 and 6. It was crucial for me to ensure that all these tags retained their multi-platform images without being destroyed.

The downside is that this script takes around 3+ minutes to execute, as each docker manifest inspect command takes a few seconds to complete.

I then use the output of this PowerShell script and provide it to the Docker retention action in essentially the same way as illustrated in your README examples:

- name: Fetch multi-platform package version SHAs
  id: multi-arch-digests
  run: |
    shas="$(pwsh ci/Get-DockerTagSHAs.ps1)"
    echo "shas=$shas" >> $GITHUB_OUTPUT

- name: Delete old pre-release images
  uses: snok/container-retention-policy@v3.0.0
  with:
    image-names: recyclarr
    account: recyclarr
    image-tags: '*-dev.*'
    token: ${{ secrets.GITHUB_TOKEN }}
    cut-off: 1w
    keep-n-most-recent: 1
    skip-shas: ${{ steps.multi-arch-digests.outputs.shas }}
    dry-run: true # Temporary

Here's the corresponding workflow run:

https://github.com/recyclarr/recyclarr/actions/runs/9829194567/job/27133962931

At first glance, it appears to work correctly, as I do not see any of my release tags in the dry run output. However, I still lack confidence in the results. The stakes are high β€” one mistake could destroy previously published releases.

I'm wondering if there's any way to increase confidence in the results. For instance, I see several untagged images in the dry run output β€” how can I be sure that these are not associated with any tag manifests? How do I ascertain that they are genuinely orphaned and safe to remove?

I believe it is crucial to have a reliable method to identify orphaned multi-platform images. Ideally, there should be no release-based tag manifests referencing them.

For further context, for every commit I push to master in my repository, my workflows build and publish those commits as prerelease images using the mutable edge tag. There are quite a few of these, so I imagine that every time the edge tag gets replaced, the previous multi-platform images become orphaned but remain available in the registry. These are the images I aim to identify and clean up. However, multi-platform images associated with genuine tags still appear as <untagged> in the output!

Apologies for the lengthy post, but I wanted to provide comprehensive context for my use case and get your advice on how to proceed. Hopefully, we can make improvements to the dry run output to reduce the ambiguity of <untagged>. If it's not possible, a quick glance over the workflow run output I previously linked to confirm its safety would be greatly appreciated.

Thank you again for your support!

sondrelg commented 1 month ago

I definitely understand your hesitation to jump into this. I would be hesitant too.

The root of the problem seems to be the need for a "proper way" to identify which images are linked as multi-platform images in the Github container registry. We haven't solved that in v3; instead we've opted for a composable API where users can solve that problem themselves - because I'm not sure how best to do it, without relying on the docker CLI.

It does seem possible to fetch manifests using this:

curl -H "X-GitHub-Api-Version: 2022-11-28" \
     -H "Accept: application/vnd.oci.image.index.v1+json" \
     -H "Authorization: Bearer $(curl -s 'https://ghcr.io/token?service=ghcr.io&scope=repository:container-retention-policy:pull' -u '...' | jq -r '.token')" \
     https://ghcr.io/v2/snok%2Fcontainer-retention-policy/manifests/v3.0.0

So maybe we could do this for each tag in the action itself. I'll take another look at it when I get a chance πŸ‘