hauler-dev / hauler

Airgap Container Swiss Army Knife
https://hauler.dev
Apache License 2.0
124 stars 28 forks source link

[feature] Allow hauler to keep upstream registry information #232

Open clausa opened 5 months ago

clausa commented 5 months ago

Is this RFE related to an Existing Problem? If so, please describe:

When adding images, information about origin or upstream registry (fqdn:port) is removed.

$ cat alertmanager-manifest.yaml
---
apiVersion: content.hauler.cattle.io/v1alpha1
kind: Images
metadata:
  annotations:
    hauler.dev/platform: linux/amd64
  name: alertmanager-images
spec:
  images:
    - name: quay.io/prometheus/alertmanager:v0.27.0

$ hauler store sync -f alertmanager-manifest.yaml
2024-04-18 15:29:35 INF syncing [content.hauler.cattle.io/v1alpha1, Kind=Images] to store
2024-04-18 15:29:35 INF adding 'image' [quay.io/prometheus/alertmanager:v0.27.0] to the store
2024-04-18 15:29:42 INF successfully added 'image' [quay.io/prometheus/alertmanager:v0.27.0]

$ hauler store info
+---------------------------------+-------+-------------+----------+---------+
| REFERENCE                       | TYPE  | PLATFORM    | # LAYERS | SIZE    |
+---------------------------------+-------+-------------+----------+---------+
| prometheus/alertmanager:v0.27.0 | image | linux/amd64 |        7 | 32.4 MB |
+---------------------------------+-------+-------------+----------+---------+
|                                                          TOTAL   | 32.4 MB |
+---------------------------------+-------+-------------+----------+---------+

This could eventually lead to clashes on the receiving / air-gapped registry, if two different, but similarly named and versioned images, are copied over.

Describe Proposed Solution(s):

$ sed -i 's}prometheus/alertmanager}quay.io/prometheus/alertmanager}' store/index.json

$ hauler store info
+-----------------------------------------+-------+-------------+----------+---------+
| REFERENCE                               | TYPE  | PLATFORM    | # LAYERS | SIZE    |
+-----------------------------------------+-------+-------------+----------+---------+
| quay.io/prometheus/alertmanager:v0.27.0 | image | linux/amd64 |        7 | 32.4 MB |
+-----------------------------------------+-------+-------------+----------+---------+
|                                                                  TOTAL   | 32.4 MB |
+-----------------------------------------+-------+-------------+----------+---------+

Keeping upstream info would also make it somewhat simpler to patch deployment manifests on air-gapped side, as one would just have to prepend the local registry:

quay.io/prometheus/alertmanager:v0.27.0 becomes: localregistry:5000/quay.io/prometheus/alertmanager:v0.27.0

instead of:

quay.io/prometheus/alertmanager:v0.27.0 -> localregistry:5000/prometheus/alertmanager:v0.27.0

(where you have to remove the registry part of the image reference, before prepending the local registry)

Describe Possible Alternatives:

Additional Context:

ngearhart commented 4 months ago

I mentioned in #241 that a potential alternative solution to this could be a manifest rewrite rule:

apiVersion: content.hauler.cattle.io/v1alpha1
kind: Images
metadata:
  name: test
spec:
  images:
    - name: registry1.dso.mil/ironbank/big-bang/argocd:v2.9.4
      rewrite: registry-name/ironbank/big-bang/argocd:v2.9.4

for example.

zackbradys commented 3 months ago

Hey @clausa, thanks for submitting this issue. I do understand the use case of persisting the upstream registry information, but I would disagree a bit on the important of persisting it, since most user cases would want to have the seamless experience between the internet connected and disconnected or airgapped environments and only need to update the registry FQDN/IP/PORT.

I think implementing an option to persist registry information might be a good non-defaulted solution, but I would rather prioritize a rewrite function short term and allow the user to call it out themselves, like @ngearhart suggested.

@amartin120 any thoughts on this?

dweomer commented 3 months ago

@zackbradys wrote (emphasis mine):

I think implementing an option to persist registry information might be a good non-defaulted solution, but I would rather prioritize a rewrite function short term and allow the user to call it out themselves, like @ngearhart suggested.

/endorsed

Moreover, we shouldn't really consider patching oci manifests that pass through Hauler without somehow communicating, upfront and loud every time, that the resulting content addresses (aka sha256 content digests) of the manifests themselves will change (tags will point to different digests) to be unique per source registry (which would be very unexpected for the vast majority of use cases). If we are talking about patching hauler manifests, however, we already have the ability to be explicit about where the content will be or was pulled from via fully-qualified image refs therein, provided we also include the manifest as a file in the haul.

I am not comfortable mutating content as a side effect. We should consider a new hauler store subcommand that is either a peer of hauler store copy, i.e. hauler store copy-with-source-annotations, or possibly a whole new subcommand such as hauler store annotate ..., or a flag for hauler store copy that when passed incurs an interactive "are you sure, y or n [default n]" input gate (mitigated via another flag, for automation puroses, such as --force or --yes or possibly something less general).

clausa commented 3 months ago

I think the rewrite feature will allow us to do whats needed - for now.

a1994sc commented 1 month ago

Bumping because this is a feature I prefer when laying out my air-gapped repos, e.i. quay.io/example -> quay-io/example