pulp / pulp_container

Pulp Container Registry
https://docs.pulpproject.org/pulp_container/
GNU General Public License v2.0
23 stars 44 forks source link

Flatpak index Registry URL should be configurable #1382

Closed stbergmann closed 1 year ago

stbergmann commented 1 year ago

Following up on https://github.com/pulp/pulp_container/commit/eec513314ad9d55843f225802ea9fb45adad16a3 "Support for Flatpak index endpoints":

https://github.com/flatpak/flatpak-oci-specs/blob/main/registry-index.md#responses mandates that the json index files contains a

Registry: URL to the registry where the images are found. This is interpreted with the base URL being the URL of the document (that is /index/[static/dynamic].)

For a standalone Pulp instance, the currently used value of django.conf.settings.CONTENT_ORIGIN (in pulp_container/app/registry_api.FlatpakIndexDynamicView.get) fits well, as (per pulp_container/app/urls) the relevant v2 API is available there.

However, a Foreman/Katello setup will have that v2 API rewired to a pulpcore_registry sub-path (see https://github.com/theforeman/puppet-pulpcore/blob/master/manifests/plugin/container.pp), so that the value no longer fits.

One solution that would happen to work for both standalone Pulp and for Foreman/Katello would be to use a hardcoded relative URL string value ".." instead of django.conf.settings.CONTENT_ORIGIN.

A more general solution might be make that value configurable through some new FLATPAK_REGISTRY_URL added to pulp_container/app/settings.py. (One minor issue with that would be that it could not default to django.conf.settings.CONTENT_ORIGIN, i.e., adding a line

FLATPAK_REGISTRY_URL = CONTENT_ORIGIN

would not work, IIUC. A workaround might be to make it default to

FLATPAK_REGISTRY_URL = None

there and then programatically replace that None default with the value of django.conf.settings.CONTENT_ORIGIN in pulp_container/app/registry_api.FlatpakIndexDynamicView.get.)

@ipanova @lubosmj @ianballou Thoughts?

ipanova commented 1 year ago

can we take approach as we have with registry_path field?

stbergmann commented 1 year ago

can we take approach as we have with registry_path field?

do you have a pointer what exactly you mean with that?

ipanova commented 1 year ago

https://github.com/pulp/pulp_container/blob/main/pulp_container/app/serializers.py#L285

stbergmann commented 1 year ago

Lacking context, I'm not sure I understand what that registry_path is, how it is intended to be used, and how it would help here. From what I understand, it is a per-distribution attribute specifying a per-distribution URL. Which might or might not be intended to be override-able. But I fail to understand how something like that would solve the problem I outline above.

ipanova commented 1 year ago

registry_path is the field exposed on the distribution which is composed out of registry url + repo name, so the end user can take that path and perform 'podman pull' against it

$ pulp container distribution create --name ina2 --base-path test/ina
Started background task /pulp/api/v3/tasks/018a6580-3b00-7778-9de3-3d4933b2661c/
Done.
{
  "pulp_created": "2023-09-05T13:20:17.256676Z",
  "pulp_labels": {},
  "repository": null,
  "pulp_href": "/pulp/api/v3/distributions/container/container/018a6580-3b67-7d47-a96c-01828ad437ca/",
  "name": "ina2",
  "hidden": false,
  "base_path": "test/ina",
  "content_guard": "/pulp/api/v3/contentguards/core/content_redirect/018a6580-39e8-736a-aec1-dc64e58826d7/",
  "repository_version": null,
  "registry_path": "pulp3-source-fedora36.puffy.example.com/test/ina",
  "namespace": "/pulp/api/v3/pulp_container/namespaces/018a6580-39fd-75b5-bf49-0321a169afaa/",
  "private": false,
  "description": null
}

so you can use get_host instead of using the content-origin from the settings.

lubosmj commented 1 year ago

@stbergmann, maybe, you can make the following change to see if something has changed? I think this is what @ipanova is trying to suggest. To take the same approach as in RegistryPathField.to_representation.

--- a/pulp_container/app/registry_api.py
+++ b/pulp_container/app/registry_api.py
@@ -619,7 +619,7 @@ class FlatpakIndexDynamicView(APIView):
             if images:
                 results.append({"Name": distribution.base_path, "Images": images})

-        return Response(data={"Registry": settings.CONTENT_ORIGIN, "Results": results})
+        return Response(data={"Registry": request.get_host(), "Results": results})
stbergmann commented 1 year ago

That wouldn't help, as request.get_host() is just the host name, and doesn't provide the pulpcore_registry sub-path to which the Foreman/Katello environment rewires the v2 API.

To maybe make my need more clear: For the Foreman/Katello environment, the JSON that gets returned would need to contain

"Registry": "https://centos8-katello-devel-stable.example.com/pulpcore_registry"

instead of the

"Registry": "https://centos8-katello-devel-stable.example.com"

it contains now.

ipanova commented 1 year ago

how the katello re-writes addresses the current registry endpoints? this flatpak is just another endpoint added

ipanova commented 1 year ago

@ekohl got any insights?

ekohl commented 1 year ago

However, a Foreman/Katello setup will have that v2 API rewired to a pulpcore_registry sub-path (see https://github.com/theforeman/puppet-pulpcore/blob/master/manifests/plugin/container.pp), so that the value no longer fits.

This is correct. I don't have a full Katello box at hand, but from our puppet-pulpcore CI (https://github.com/theforeman/puppet-pulpcore/blob/master/spec/acceptance/plugins_spec.rb) we do this in Apache:

# ************************************
# Vhost template in module puppetlabs-apache
# Managed by Puppet
# ************************************
#
<VirtualHost *:443>
  ServerName centos8-64.example.com

  ## Vhost docroot
  DocumentRoot "/var/lib/pulp/pulpcore_static"

  ## Directories, there should at least be a declaration for /var/lib/pulp/pulpcore_static

  <Directory "/var/lib/pulp/pulpcore_static">
    Options -Indexes -FollowSymLinks
    AllowOverride None
    Require all granted
  </Directory>

  <Location "/pulp/content">
    Require all granted
      ## Request Header rules
            RequestHeader unset X-CLIENT-CERT
            RequestHeader set X-CLIENT-CERT "%{SSL_CLIENT_CERT}s" env=SSL_CLIENT_CERT
          ProxyPass unix:///run/pulpcore-content.sock|http://pulpcore-content/pulp/content disablereuse=on timeout=600
    ProxyPassReverse unix:///run/pulpcore-content.sock|http://pulpcore-content/pulp/content
        </Location>

  <Location "/pulp/api/v3">
    Require all granted
      ## Request Header rules
            RequestHeader unset REMOTE_USER
            RequestHeader set REMOTE_USER "%{SSL_CLIENT_S_DN_CN}s" env=SSL_CLIENT_S_DN_CN
          ProxyPass unix:///run/pulpcore-api.sock|http://pulpcore-api/pulp/api/v3 timeout=600
    ProxyPassReverse unix:///run/pulpcore-api.sock|http://pulpcore-api/pulp/api/v3
        </Location>

  ## Logging
  ErrorLog "/var/log/httpd/pulpcore-https_error_ssl.log"
  ServerSignature Off
  CustomLog "/var/log/httpd/pulpcore-https_access_ssl.log" combined

  ProxyPass /pulp_ansible/galaxy/ unix:///run/pulpcore-api.sock|http://pulpcore-api/pulp_ansible/galaxy/
  ProxyPassReverse /pulp_ansible/galaxy/ unix:///run/pulpcore-api.sock|http://pulpcore-api/pulp_ansible/galaxy/

  <Location "/pulpcore_registry/v2/">
    RequestHeader unset REMOTE_USER
    RequestHeader set REMOTE_USER "%{SSL_CLIENT_S_DN_CN}s" env=SSL_CLIENT_S_DN_CN
    ProxyPass unix:///run/pulpcore-api.sock|http://pulpcore-api/v2/
    ProxyPassReverse unix:///run/pulpcore-api.sock|http://pulpcore-api/v2/
  </Location>

  ProxyPass /pulp/container/ unix:///run/pulpcore-content.sock|http://pulpcore-content/pulp/container/
  ProxyPassReverse /pulp/container/ unix:///run/pulpcore-content.sock|http://pulpcore-content/pulp/container/

  ## Proxy rules
  ProxyRequests Off
  ProxyPreserveHost Off
  ProxyPass /assets/ unix:///run/pulpcore-api.sock|http://pulpcore-api/assets/
  ProxyPassReverse /assets/ unix:///run/pulpcore-api.sock|http://pulpcore-api/assets/

  ## SSL directives
  SSLEngine on
  SSLCertificateFile      "/etc/pulpcore-certs/ca-cert.pem"
  SSLCertificateKeyFile   "/etc/pulpcore-certs/ca-key.pem"
  SSLVerifyClient         optional
  SSLCACertificateFile    "/etc/pulpcore-certs/ca-cert.pem"
  SSLOptions +StdEnvVars +ExportCertData
</VirtualHost>

The relevant bit:

  <Location "/pulpcore_registry/v2/">
    RequestHeader unset REMOTE_USER
    RequestHeader set REMOTE_USER "%{SSL_CLIENT_S_DN_CN}s" env=SSL_CLIENT_S_DN_CN
    ProxyPass unix:///run/pulpcore-api.sock|http://pulpcore-api/v2/
    ProxyPassReverse unix:///run/pulpcore-api.sock|http://pulpcore-api/v2/
  </Location>

So /pulpcore_registry/v2/ is passed to /v2/ on the API app.

I don't recall exactly why we remapped this, but I suspect we had a conflict with something else. It's a big problem we've had: we provide Foreman, Pulp and RHSM all on the same vhost. I'm not sure what v2 conflicts (conflicted?) with.

ianballou commented 1 year ago

So /pulpcore_registry/v2/ is passed to /v2/ on the API app.

I don't recall exactly why we remapped this, but I suspect we had a conflict with something else. It's a big problem we've had: we provide Foreman, Pulp and RHSM all on the same vhost. I'm not sure what v2 conflicts (conflicted?) with.

Is this not why we remapped it? We need to have the Pulpcore registry hosted at one endpoint, and the Katello registry at another endpoint. I don't think both could live at /v2/

ekohl commented 1 year ago

Is this not why we remapped it? We need to have the Pulpcore registry hosted at one endpoint, and the Katello registry at another endpoint. I don't think both could live at /v2/

Oh yes, I think that's it. https://github.com/Katello/katello/blob/master/config/routes/api/registry.rb appears to be that. And we have https://github.com/Katello/smart_proxy_container_gateway on the content proxies (AKA capsules), which are configure via https://github.com/theforeman/puppet-foreman_proxy_content/blob/master/manifests/container.pp

So I wouldn't expect end users to access /pulpcore_registry. We still mount /v2 to a container registry. There are just many layers in between that make it hard to follow.

stbergmann commented 1 year ago

So I wouldn't expect end users to access /pulpcore_registry. We still mount /v2 to a container registry. There are just many layers in between that make it hard to follow.

I'm not sure what to make of the above.

At least for my centos8-katello-devel-stable.example.com test setup, where I'm providing the sample flatpaks "cheese" and "f38/flatpak-runtime" from https://registry.fedoraproject.org/: curl https://centos8-katello-devel-stable.example.com/pulpcore_registry/v2/_catalog returns the expected {"repositories":["default_organization-test1-cheese","default_organization-test1-f38_flatpak-runtime"]}, while curl https://centos8-katello-devel-stable.example.com/v2/_catalog returns the unhelpful empty {"repositories":[]}.

And trying to install that sample "cheese" flatpak via

flatpak remote-add test oci+https://centos8-katello-devel-stable.example.com/pulpcore_registry/
flatpak install test org.gnome.Cheese

(where the given oci+https://centos8-katello-devel-stable.example.com/pulpcore_registry/ is the place at which the flatpak executable expects to find the flatpak index, see https://github.com/flatpak/flatpak-oci-specs/blob/main/registry-index.md) fails when https://centos8-katello-devel-stable.example.com/pulpcore_registry/index/static?label:org.flatpak.ref:exists=1 currently returns "Repository": "https://centos8-katello-devel-stable.example.com" (at which place the flatpak executable expects to find the relevant v2 API), but would work if it returned "Repository": "https://centos8-katello-devel-stable.example.com/pulpcore_registry" instead.

ekohl commented 1 year ago

@stbergmann it may be a problem in the devel box setup. https://github.com/theforeman/puppet-katello_devel is not entirely the same as a real production setup. However, based on https://github.com/pulp/pulp_container/issues/1382#issuecomment-1708580532 I think we can summarize it as this sequence:

Client: GET https://centos8-katello-devel-stable.example.com/v2/_catalog Apache: GET http://localhost:3000/v2/_catalog Puma (running Foreman + Katello): Handle code, includes GET https://centos8-katello-devel-stable.example.com/pulpcore_registry/v2/_catalog Apache: GET unix:///run/pulpcore-api.sock|https://pulpcore-api/v2/_catalog gunicorn (running Pulp + pulp_container): Handle code

On a production box Puma runs on a unix socket, but it's otherwise the same.

The sequence is slightly different for a content proxy: there you don't have Puma but instead smart-proxy running smart_proxy_container_gateway. That also ends up connecting to Apache + gunicorn, so conceptually similar.

At least for my centos8-katello-devel-stable.example.com test setup, where I'm providing the sample flatpaks "cheese" and "f38/flatpak-runtime" from https://registry.fedoraproject.org/: curl https://centos8-katello-devel-stable.example.com/pulpcore_registry/v2/_catalog returns the expected {"repositories":["default_organization-test1-cheese","default_organization-test1-f38_flatpak-runtime"]}, while curl https://centos8-katello-devel-stable.example.com/v2/_catalog returns the unhelpful empty {"repositories":[]}.

With the knowledge of above I'd say it's somewhere in the Katello code. I'd focus my debugging efforts there, because it's probably filtering out something. If it should filter out things based on permissions then that may be a security vulnerability, not a feature.

stbergmann commented 1 year ago

At least for my centos8-katello-devel-stable.example.com test setup, where I'm providing the sample flatpaks "cheese" and "f38/flatpak-runtime" from https://registry.fedoraproject.org/: curl https://centos8-katello-devel-stable.example.com/pulpcore_registry/v2/_catalog returns the expected {"repositories":["default_organization-test1-cheese","default_organization-test1-f38_flatpak-runtime"]}, while curl https://centos8-katello-devel-stable.example.com/v2/_catalog returns the unhelpful empty {"repositories":[]}.

With the knowledge of above I'd say it's somewhere in the Katello code. I'd focus my debugging efforts there, because it's probably filtering out something. If it should filter out things based on permissions then that may be a security vulnerability, not a feature.

Turns out there is a difference between unauthorized curl https://centos8-katello-devel-stable.example.com/v2/_catalog returning empty {"repositories":[]} vs. authorized curl -u admin:changeme https://centos8-katello-devel-stable.example.com/v2/_catalog returning the expected {"repositories":["default_organization-test1-cheese","default_organization-test1-f38_flatpak-runtime"]}. (And the reason why unauthorized curl https://centos8-katello-devel-stable.example.com/pulpcore_registry/v2/_catalog returned the expected {"repositories":["default_organization-test1-cheese","default_organization-test1-f38_flatpak-runtime"]} for me is presumably because I had previously copied http://centos8-katello-devel-stable.example.com/pub/katello-server-ca.crt to /etc/pki/ca-trust/source/anchors/katello-server-ca.crt on my client box.)

I'm not sure how integration of Flatpak support in Pulp fits into this Foreman/Katello setup. The flatpak executable does unauthorized access against the v2 API URL that is advertised in a flatpak index "Repository" response. (Even if we would advertise a http://admin:changeme@centos8-katello-devel-stable.example.com URL there, the flatpak executable would need to be modified to obey that password embedded in the URL instead of throwing it away as it currently does. Not to mention that such a setup would be a security disaster anyway.)

stbergmann commented 1 year ago

The flatpak executable does unauthorized access against the v2 API URL that is advertised in a flatpak index "Repository" response

Please hold on. Maybe this is a misunderstanding on my part and this whole issue will turn out to actually be a non-issue. I'll report back here if I have a better picture. Meanwhile, sorry for the noise and thanks for all your input.

ekohl commented 1 year ago

(And the reason why unauthorized curl https://centos8-katello-devel-stable.example.com/pulpcore_registry/v2/_catalog returned the expected {"repositories":["default_organization-test1-cheese","default_organization-test1-f38_flatpak-runtime"]} for me is presumably because I had previously copied http://centos8-katello-devel-stable.example.com/pub/katello-server-ca.crt to /etc/pki/ca-trust/source/anchors/katello-server-ca.crt on my client box.)

This shouldn't matter. because that's just the public certificate, nothing private like credentials. At best it can help with verifying the connection is secure. It may still indicate a security problem.

stbergmann commented 1 year ago

(And the reason why unauthorized curl https://centos8-katello-devel-stable.example.com/pulpcore_registry/v2/_catalog returned the expected {"repositories":["default_organization-test1-cheese","default_organization-test1-f38_flatpak-runtime"]} for me is presumably because I had previously copied http://centos8-katello-devel-stable.example.com/pub/katello-server-ca.crt to /etc/pki/ca-trust/source/anchors/katello-server-ca.crt on my client box.)

This shouldn't matter. because that's just the public certificate, nothing private like credentials. At best it can help with verifying the connection is secure. It may still indicate a security problem.

No idea if this intentional or not, or indicative of some security problem. I notice that even if I add a non-flatpak container (via Repo Discovery -> Quay), that also ends up in the response to an unauthorized curl https://centos8-katello-devel-stable.example.com/pulpcore_registry/v2/_catalog (which now returns {"repositories":["default_organization-test1-cheese","default_organization-test1-f38_flatpak-runtime","default_organization-test1-quay_busybox"]}). I'm not going to follow up on that as part of this issue here.

stbergmann commented 1 year ago

The flatpak executable does unauthorized access against the v2 API URL that is advertised in a flatpak index "Repository" response

Please hold on. Maybe this is a misunderstanding on my part and this whole issue will turn out to actually be a non-issue. I'll report back here if I have a better picture. Meanwhile, sorry for the noise and thanks for all your input.

Turned out the secret wisdom is to pass --authenticator-name=org.flatpak.Authenticator.Oci on the flatpak remote-add command line. (Which is equivalent to adding AuthenticatorName=org.flatpak.Authenticator.Oci to a *.flatpakrepo file, as is e.g. used by https://flatpaks.redhat.io/rhel.flatpakrepo). With that in place, flatpak install from that remote will query for a username and password on stdin.

ipanova commented 1 year ago

@stbergmann based on your last comment - the client can ask for credentials, does it make sense to add auth to the index endpoint? this way we'd be able to expose not only public repos. Maybe this is out of scope but at least worth opening a separate future RFE?

stbergmann commented 1 year ago

@stbergmann based on your last comment - the client can ask for credentials, does it make sense to add auth to the index endpoint? this way we'd be able to expose not only public repos. Maybe this is out of scope but at least worth opening a separate future RFE?

I have no idea whether that auth support in flatpak only covers access to the v2 API, or also access to the index. (And if it doesn't cover the latter it would apparently not make sense to hide the latter behind some auth mechanism.) @owtaylor might know?