paketo-buildpacks / libpak

An opinionated extension to the libcnb Cloud Native Buildpack Library
Apache License 2.0
15 stars 17 forks source link

Allow for overriding default dependency source #315

Closed bitgully closed 5 months ago

bitgully commented 6 months ago

Summary

Using the introduced environment variable BP_DEPENDENCY_SOURCE_OVERRIDE we can override the default source (scheme://host:port) of dependencies. Excluding those explicitly defined in bindings.

Use Cases

In (air-gapped) corporate networks, it is common to work with mirror registries. Dependencies cannot be pulled directly from the public internet but must run through the private mirror for security and performance/resilience reasons. In such cases, we would have to manually define several individual dependency-mapping bindings for each dependency version. It is more practical to set the mirror's address once for all dependencies.

Sample

Setting variable BP_DEPENDENCY_SOURCE_OVERRIDE=https://mirror.acme.com would lead to this URI replacement: Before: https://github.com/bell-sw/Liberica/releases/download/11.0.8+10/bellsoft-jre11.0.8+10-linux-amd64.tar.gz After: https://mirror.acme.com/bell-sw/Liberica/releases/download/11.0.8+10/bellsoft-jre11.0.8+10-linux-amd64.tar.gz

Checklist

dmikusa commented 6 months ago

This is something that we've resisted doing in the past. Dependency mappings would be the suggested way to do this instead, or build your own buildpack images and then you can point the dependencies to wherever you want.

Security and provenance have been the primary motivators for not doing this previously. What you've proposed is a little different though, only being able to modify the scheme://host:port. Off the top of my head, I'd like this more if it were just allowing you to change the host:port, because everything should always be HTTPS. Would that still work for you?

I appreciate the PR. I need to think about this some more and see what the other maintainers think as well.

Thanks!

bitgully commented 6 months ago

Sure, HTTPS only would work for me too. Overriding the scheme was just meant for backwards compatibility or in case anyone has a use case without TLS (maybe for performance reasons). Let me know your preference when you are ready, and I will adapt it accordingly.

dmikusa commented 6 months ago

Excellent. Yes, please update to require HTTPS.

Other thoughts:

  1. I hate naming things cause it's hard and I don't like being picky about names cause it's subject but I think we should discuss the env variable name a bit. The term "source" was a little confusing to me. My preference would be to try to name it using the same terminology that we're using now. We have "dependency mappings" to remap a particular dependency, so maybe we have a "dependency host mapping"? or a "dependency domain mapping"? or a "dependency server mapping"? To indicate you're remapping an entire host/domain/server. What do others think about this?

    I'd be OK using the "mirror" terminology as well. I know that buildpacks use this and it's a pretty standard concept.

  2. I'm curious about using an env variable versus a binding, like with a dependency mapping? Did you have a specific reason for env variable? Just thinking through things in my head. If we used a binding instead of an env variable, then the process would be more consistent with the existing dependency mapping process. It would also facilitate using embedded HTTP basic credentials, which is something that would be nice to support since you wouldn't want to put those in an env variable.

  3. HTTP basic credentials. I feel like this is something people will want/need. If you don't want to include this now, I'm fine with that, but it's something we should think through so that we can support it in the future.

  4. I think we should somehow indicate a change like this has happened to the user. Definitely at build time. Possibly in the SBOM information or possibly as a label on the generated container? 🤔 but I would prefer in the SBOM. I just want to make sure that it's clear how the container was created and that provenance is clear as well.

loewenstein commented 6 months ago

I don't know if the sbom needs any update. We are still sticking to the sha that the buildpacks have configured, no? So, I cannot come up with any potential tampering that could have happened.

I'd definitely put something in the build log though.

dmikusa commented 6 months ago

My thought in regards to the SBOM is that we've now got different potential sources for the files. You can't tamper with them, because of the SHA256 hash, but it would still be helpful to know exactly where the files were sourced from. It's probably something we should do with dependency mapping, but I don't believe we do at the moment.

I'm also OK with spinning that out into another issue, but just so that we're thinking it through and doing things in a way where it is possible.

Oh, the other thing I was thinking about. We need to support file:// as well as https://. A big driver of dependency mapping is to switch from external to local URLs (typically added via a volume mount). We should support that here, because it'll make using local files a lot simpler. If that's a lot more work, again, I'm OK spinning that out into a separate issue.

@bitgully A lot of this is me thinking out loud. When it comes to additional features/functionality, implement what you're comfortable implementing. We'll move the rest into other issues.

bitgully commented 6 months ago

@loewenstein: A hint, that the original URI was overridden, is printed to the build log here.

@dmikusa:

  1. True, naming is hard. I also thought of things like: "dependency registry", "dependency mirror", "dependency mapping proxy", "dependency mapping location". But I'm happy with any other version really.
  2. The reason I chose an env variable was that I mostly work in Kubernetes environments, where the env variable can be injected into the build container from a Kubernetes Secret and therefore would only exist inside the container. RBAC rules define who can view/edit such Secrets. I agree, when not running on a container orchestration platform, using an env variable might not be feasible. But having embedded HTTP credentials saved inside a binding file seemed less than ideal too to me.
  3. The HTTP basic credentials would in general be supported in this change here.
  4. Regarding the SBOM: I simply considered that mirror more like an external cache and didn't see it as an alternative location since artefacts would only be proxied through this. However, one could also use this override mechanism to simply point to any other server that actually isn't set up as a mirror/proxy. As the same applies to any dependency mapping, it would make sense to move this into a separate issue though.
bitgully commented 5 months ago

I have decided to use "BP_DEPENDENCY_MIRROR" as a name now and limited the support for the https:// and file:// schemes only (no more http:// allowed). I'm planning to add the option of setting a mirror using a binding instead of an env variable at a later stage as well. But I'm tied up in a few other projects at the moment. But I have introduced an additional feature "BP_DEPENDENCY_MIRROR_PRESERVE_HOST" too. This can be necessary for use cases, where it isn't sufficient to just replace the hostname but also preserve the original hostname in the path on the mirror. Depending on the mirror's setup, this can be vital to ensure uniqueness of the (mirrored/proxied) sources.

bitgully commented 5 months ago

I have reworked this now. The changes are:

dmikusa commented 5 months ago

Thank you! Sorry for the delay. I will get back to this today or tomorrow.

dmikusa commented 5 months ago

I think this came out really awesome. Thanks so much for sticking through the review and incorporating feedback.

@anthonydahanne Are you 👍 also? If so, I'll merge.

loewenstein commented 5 months ago

Late to the game... I am currently trying to understand how one could for example mirror the SapMachine as used in the sap-machine buildpack and there doesn't seem to be any documentation of this feature to have been considered in the PR and review.

So

  1. Given the original dependency is https://github.com/SAP/SapMachine/releases/download/sapmachine-11.0.22/sapmachine-jdk-11.0.22_linux-x64_bin.tar.gz would have to take the full path into the mirror, i.e. /SAP/SapMachine/releases/download/sapmachine-11.0.22/sapmachine-jdk-11.0.22_linux-x64_bin.tar.gz, right?
  2. Should we consider to document this new feature at least in the repository's repo and maybe also on the paketo.io documentation?
dmikusa commented 5 months ago

Moving this to the RFC.