Closed bitgully closed 5 months ago
This is something that we've resisted doing in the past. Dependency mappings would be the suggested way to do this instead, or build your own buildpack images and then you can point the dependencies to wherever you want.
Security and provenance have been the primary motivators for not doing this previously. What you've proposed is a little different though, only being able to modify the scheme://host:port
. Off the top of my head, I'd like this more if it were just allowing you to change the host:port
, because everything should always be HTTPS. Would that still work for you?
I appreciate the PR. I need to think about this some more and see what the other maintainers think as well.
Thanks!
Sure, HTTPS only would work for me too. Overriding the scheme was just meant for backwards compatibility or in case anyone has a use case without TLS (maybe for performance reasons). Let me know your preference when you are ready, and I will adapt it accordingly.
Excellent. Yes, please update to require HTTPS.
Other thoughts:
I hate naming things cause it's hard and I don't like being picky about names cause it's subject but I think we should discuss the env variable name a bit. The term "source" was a little confusing to me. My preference would be to try to name it using the same terminology that we're using now. We have "dependency mappings" to remap a particular dependency, so maybe we have a "dependency host mapping"? or a "dependency domain mapping"? or a "dependency server mapping"? To indicate you're remapping an entire host/domain/server. What do others think about this?
I'd be OK using the "mirror" terminology as well. I know that buildpacks use this and it's a pretty standard concept.
I'm curious about using an env variable versus a binding, like with a dependency mapping? Did you have a specific reason for env variable? Just thinking through things in my head. If we used a binding instead of an env variable, then the process would be more consistent with the existing dependency mapping process. It would also facilitate using embedded HTTP basic credentials, which is something that would be nice to support since you wouldn't want to put those in an env variable.
HTTP basic credentials. I feel like this is something people will want/need. If you don't want to include this now, I'm fine with that, but it's something we should think through so that we can support it in the future.
I think we should somehow indicate a change like this has happened to the user. Definitely at build time. Possibly in the SBOM information or possibly as a label on the generated container? 🤔 but I would prefer in the SBOM. I just want to make sure that it's clear how the container was created and that provenance is clear as well.
I don't know if the sbom needs any update. We are still sticking to the sha that the buildpacks have configured, no? So, I cannot come up with any potential tampering that could have happened.
I'd definitely put something in the build log though.
My thought in regards to the SBOM is that we've now got different potential sources for the files. You can't tamper with them, because of the SHA256 hash, but it would still be helpful to know exactly where the files were sourced from. It's probably something we should do with dependency mapping, but I don't believe we do at the moment.
I'm also OK with spinning that out into another issue, but just so that we're thinking it through and doing things in a way where it is possible.
Oh, the other thing I was thinking about. We need to support file://
as well as https://
. A big driver of dependency mapping is to switch from external to local URLs (typically added via a volume mount). We should support that here, because it'll make using local files a lot simpler. If that's a lot more work, again, I'm OK spinning that out into a separate issue.
@bitgully A lot of this is me thinking out loud. When it comes to additional features/functionality, implement what you're comfortable implementing. We'll move the rest into other issues.
@loewenstein: A hint, that the original URI was overridden, is printed to the build log here.
@dmikusa:
I have decided to use "BP_DEPENDENCY_MIRROR" as a name now and limited the support for the https:// and file:// schemes only (no more http:// allowed). I'm planning to add the option of setting a mirror using a binding instead of an env variable at a later stage as well. But I'm tied up in a few other projects at the moment. But I have introduced an additional feature "BP_DEPENDENCY_MIRROR_PRESERVE_HOST" too. This can be necessary for use cases, where it isn't sufficient to just replace the hostname but also preserve the original hostname in the path on the mirror. Depending on the mirror's setup, this can be vital to ensure uniqueness of the (mirrored/proxied) sources.
I have reworked this now. The changes are:
Thank you! Sorry for the delay. I will get back to this today or tomorrow.
I think this came out really awesome. Thanks so much for sticking through the review and incorporating feedback.
@anthonydahanne Are you 👍 also? If so, I'll merge.
Late to the game... I am currently trying to understand how one could for example mirror the SapMachine as used in the sap-machine buildpack and there doesn't seem to be any documentation of this feature to have been considered in the PR and review.
So
https://github.com/SAP/SapMachine/releases/download/sapmachine-11.0.22/sapmachine-jdk-11.0.22_linux-x64_bin.tar.gz
would have to take the full path into the mirror, i.e. /SAP/SapMachine/releases/download/sapmachine-11.0.22/sapmachine-jdk-11.0.22_linux-x64_bin.tar.gz
, right?Moving this to the RFC.
Summary
Using the introduced environment variable BP_DEPENDENCY_SOURCE_OVERRIDE we can override the default source (scheme://host:port) of dependencies. Excluding those explicitly defined in bindings.
Use Cases
In (air-gapped) corporate networks, it is common to work with mirror registries. Dependencies cannot be pulled directly from the public internet but must run through the private mirror for security and performance/resilience reasons. In such cases, we would have to manually define several individual dependency-mapping bindings for each dependency version. It is more practical to set the mirror's address once for all dependencies.
Sample
Setting variable BP_DEPENDENCY_SOURCE_OVERRIDE=https://mirror.acme.com would lead to this URI replacement: Before: https://github.com/bell-sw/Liberica/releases/download/11.0.8+10/bellsoft-jre11.0.8+10-linux-amd64.tar.gz After: https://mirror.acme.com/bell-sw/Liberica/releases/download/11.0.8+10/bellsoft-jre11.0.8+10-linux-amd64.tar.gz
Checklist