open-telemetry / community

OpenTelemetry community content
https://opentelemetry.io
Apache License 2.0
796 stars 238 forks source link

Download URLs for opentelemetry artifacts #1993

Open svrnm opened 9 months ago

svrnm commented 9 months ago

While many language SDKs are installed via their respective package managers, we have a set of projects that produce artifacts that are downloaded by end-users via GitHub. Some of them are

Right now those artifacts are served via GitHub and end users need to pull them from URLs like

https://github.com/open-telemetry/opentelemetry-collector/releases/download/cmd%2Fbuilder%2Fv0.95.0/ocb_0.95.0_linux_amd64

Those URLs have 2 issues:

As proposed by @austinlparker and discussed in https://github.com/open-telemetry/opentelemetry.io/issues/4079 we would like to give scarf.sh a try, which can turn the URL above into something like

https://get.opentelemetry.io/ocb_0.95.0_linux_amd64

I raise this community issue, because to do so I would need some support from different SIGs:

I can and will create issues in SIGs repositories as needed.


Notes:

jpkrohling commented 4 months ago

I was finally able to get to this. All in all, I'm happy with Scarf, but there's one thing I would recommend before adopting it: prepare for a plan B. In case Scarf gets down for longer periods of time, we should be ready to switch to this plan B. In the worst case, the proxy itself can be implemented in a few lines of Go, but we need to be able to run this proxy somewhere, even if temporarily. This could be something for the SIG Tooling to work on.

Here are some notes for reference:

I believe our configuration on scarf.sh has changed so that the correct URL to download the latest ocb would be:

https://get.opentelemetry.io/0.105.0/linux/amd64/ocb

And it resulted in the following redirect:

< HTTP/2 302 
< date: Thu, 18 Jul 2024 11:52:47 GMT
< location: https://github.com/open-telemetry/opentelemetry-collector/releases/download/cmd%2Fbuilder%2Fv0.105.0/ocb_0.105.0_linux_amd64
< strict-transport-security: max-age=15724800; includeSubDomains
jpkrohling commented 4 months ago

And a personal request: if we decide to use it for container images as well, can we use "cr" as the subdomain, instead of docker? Docker is one specific technology (and company), while "cr" is "container registry", as used elsewhere as well.

austinlparker commented 4 months ago

Yeah, we could make it whatever. download.opentelemetry.io? packages.opentelemetry.io?

jpkrohling commented 4 months ago

I like get.opentelemetry.io for the files, and cr.opentelemetry.io (or containers.opentelemetry.io) for containers, as we might have other packages in the future (npm, for instance).

svrnm commented 4 months ago

before adopting it: prepare for a plan B. In case Scarf gets down for longer periods of time, we should be ready to switch to this plan B. In the worst case, the proxy itself can be implemented in a few lines of Go, but we need to be able to run this proxy somewhere, even if temporarily.

I thought about that potential plan B for a little bit, here is a proposal (and I would like @chalin to also take a look): we use the website (specifically netlify) by writing redirects into the netlify.toml, e.g.

[[redirects]]
from = "https://get.opentelemetry.io/:version/:os/:arch/ocb"
to = "https://github.com/open-telemetry/opentelemetry-collector/releases/download/cmd%2Fbuilder%2Fv:version/ocb_:version_:os_:arch"

This provides a very similar functionality to scarf (minus the analytics) functionality.

chalin commented 4 months ago

A few thoughts:

If y'all agree, then we could incrementally implement this Netlify-based redirects approach, without a need for a fallback plan B. WDYT?

svrnm commented 4 months ago

@chalin, good point! I think one reason for having scarf.sh is exactly the analytics part. For me the short URLs are the main reason to have a solution

chalin commented 4 months ago

So you want to switch from GA4 to Scarf.sh for analytics? (If so, maybe we can move that discussion to another thread?) Does anyone have enough experience with the use of Scarf.sh for the purpose of analytics? (I'll ask internally.)

svrnm commented 4 months ago

No, this is not about switching from ga4 to scarf.sh, but in that particular use case ga4 is not going to track anything, since these download URLs do not result in any HTML being downloaded and JavaScript being executed for that matter.

We probably could use netlify logs or something as an alternative, but if analytics of downloads is important to us, scarf.sh (since it is LF/CNCF "approved") is the easist thing to do.

svrnm commented 3 months ago

Following up on this, netlify has analytics capabilities via server side logs, which if we go with the redirect option probably provides similar functionality: https://docs.netlify.com/monitor-sites/site-analytics/

jpkrohling commented 3 months ago

Note that Scarf would also proxy the container images. During my review, I saw that they don't do a simple redirect of the container images, but rather, have a proper proxy in place especially to handle the authentication. That's the reason I suggested a Go application serving as proxy. For the cases where scarf issues a redirect, plain redirects at netlify would certainly work.

svrnm commented 3 months ago

Note that netlify is able to do redirects as well, I used them for the go.opentelemetry.io prototype:

https://docs.netlify.com/routing/redirects/

I was not aware that a proxy is needed for docker images (I assume there is a reason why they do that). This of course raises the question about required capacity. I could imagine this is quickly going into some 100GBs?

jpkrohling commented 3 months ago

This of course raises the question about required capacity

They have a page explaining that, but it's related to how auth works for Docker's registry.

When a user requests a Docker container image through Scarf, Scarf simply issues a redirect response, pointing to whichever hosting provider you've configured for your container. Certain container runtimes do not handle redirects appropriately during authentication (which is required even for anonymous pulls), and, in those cases, Scarf will proxy the request to the host instead of redirecting.

https://docs.scarf.sh/gateway/#how-it-works

svrnm commented 3 months ago

This of course raises the question about required capacity

They have a page explaining that, but it's related to how auth works for Docker's registry.

When a user requests a Docker container image through Scarf, Scarf simply issues a redirect response, pointing to whichever hosting provider you've configured for your container. Certain container runtimes do not handle redirects appropriately during authentication (which is required even for anonymous pulls), and, in those cases, Scarf will proxy the request to the host instead of redirecting.

This is the one compelling reason for scarf, they figured that part out and probably also make sure that this works with registries across the board, while this would be our own responsibility if we go through hugo+netlify.