Open stenh0use opened 10 months ago
We will be adding more bootstrappers as part of the work to integrate with k3s, which means that in theory it should be possible. Are you planning on using the KV store in Consul to share the public key?
Depending on how your container platform is designed it might be more interesting for you to import Spegel as a library in the same way that k3s will?
I only briefly looked at the kubernetes bootstrapping code so I may be being naive here. I was thinking a similar leader election process would work with Consul as the KV backend for locking and choosing the initial leader. I'm using Nomad and Consul, so was looking to run Spegel as a system job on Nomad to handle caching and sharing existing docker images across nodes.
I did see some mention of Spegel in k3s the other day, but didn't dive into the implementation details. Given that you say it will be embedded as a library it probably wouldn't be right for me unless Hashicorp were to accept it into their project.
We will be adding more bootstrappers as part of the work to integrate with k3s
What bootstrapping methods are you planning for the k3s integration?
I am back from the holidays now so should be a bit faster to respond.
I think that adding support for Nomad would be great to expand the user base. It has been a while since I have used Nomad so had a look at the different container drivers out there.
Before we dive into looking at bootstrappers we need to verify that one of these drivers will work with Spegel. The main issue is that Spegel relies on CRI for the mirror configuration to work. Check how Containerd implements its CRI server.
The Containerd driver does not implement any support for CRI mirror configuration.
It looks like this is also the case with the podman driver.
@stenh0use have i missed some driver that you are using? I think we need to prove that Spegel will work on your Nomad setup before looking more at how to bootstrap Spegel.
Yeah sane thought process there, I'm using the builtin docker driver. The driver interfaces with dockerd
and my assumption that it would work was based on dockerd
using containerd
as the runtime. This assumption seems to be wrong as I have since found that dockerd
is only using the containerd
runtime and is not using the image store.
So after looking through dockerd
today I'm not sure how confident I am that Spegel will just work, although it looks like v24 implemented experimental support for enabling containerd
as the image store.
https://github.com/moby/moby/issues/38043
There are still 20 outstanding issues attached to this issue for "fix remaining failing tests with the containerd image store" so hopefully it's not too far away from graduating from experimental to supported.
Oh there is a third driver, how did i miss the built in driver?
I had a look at how docker does registry mirroring, and it is limited. Configuring the Docker daemon is simple enough, and just requires a restart of the daemon. The problem is that this will first of all mirror all image pulls, meaning it will not be possible to exclude registries. Second of all Docker does not include any reference to the original registry in its requests, which makes resolving tags impossible.
https://docs.docker.com/docker-hub/mirror/#configure-the-docker-daemon
I am a bit stuck right now. We need to figure out how to enable tag resolving for Docker. Spegel would work on Nomad with the Docker driver if we figure that out.
There are still 20 outstanding issues attached to this issue for "fix remaining failing tests with the containerd image store" so hopefully it's not too far away from graduating from experimental to supported.
The only remaining issues are for the (somewhat deprecated) classic builder, and these issues are the cache not working, but the build works. I guess what I'm saying is, give this a try, tell us if something breaks :)
Here's how to enable the containerd image store feature https://docs.docker.com/storage/containerd/
@rumpl thanks for the input, I am unsure if using Containerd image store would solve this problem. Spegel relies on Containerds CRI implementation to supporting registry mirroring. Using another Snapshotter would not solve this problem as the image would still be pulled without the CRI API.
Check how Containerd implements its CRI server. https://github.com/containerd/containerd/blob/c98cb4af223348b78fc3b8c09762bc79983670b0/pkg/cri/server/images/image_pull.go#L132-L135
I looked at the docker source code and It looks like it is using a different ImageService when containerd-snapshotter
is set. The image pull resolver is defined how you linked to in the containerd source code.
https://github.com/moby/moby/blob/9cebefa7175c849a0fb89be9a2c0c23755afb3e2/daemon/daemon.go#L1089-L1097 https://github.com/moby/moby/blob/9cebefa7175c849a0fb89be9a2c0c23755afb3e2/daemon/containerd/image_pull.go#L70 https://github.com/moby/moby/blob/9cebefa7175c849a0fb89be9a2c0c23755afb3e2/daemon/containerd/resolver.go#L28-L32
Although I'm not entirely sure if this solves the problem?
Good news, after a lot of tinkering and going through code I think I have figured it out. Using the Containerd snapshotter together with configuring the mirror in /etc/docker/daemon.json
results in a HTTP request identical to one received when pulling using Containerd. The only downside with using Docker is that it is not possible to limit mirroring of only specific registries, but that is more on Docker than it is on Spegel.
I think we should be able to move forward with this feature. The next step is to determine the best method of running Spegel in Nomad. The simplest should be to run it in a Docker container.
Great news and thanks for tinkering! I think I'm ok with the downside that it's not possible to limit the mirroring of specific registries so long as it can mirror gcp gcr/ar registries.
The best way to run it I would think is in a Docker container as system job, it's similar to a DaemonSet.
I need to setup a test Nomad cluster to see how networking works, among other things. After that I should be able to figure out how bootstrapping should look like.
Can I help you some how? I was thinking either host or bridge network would work with a static port as a system job, similar to how you've done it in kubernetes. The metrics port can be dynamic and registered as a service in consul for prometheus service discovery.
https://developer.hashicorp.com/nomad/docs/job-specification/network#mode https://developer.hashicorp.com/nomad/docs/schedulers#system
I have a WIP for hashistack in docker: https://github.com/stenh0use/hind
I have locally updated the docker-ce version and was able to get containerd-snapshotter
working for docker pull
, but docker run
was having problems mounting volumes in my dind setup (I need to figure that out). I can clean that up today and push it if that is helpful?
root@9ecd8301374e:/# ctr --namespace moby images ls
REF TYPE DIGEST SIZE PLATFORMS LABELS
docker.io/library/hello-world:latest application/vnd.oci.image.index.v1+json sha256:ac69084025c660510933cca701f615283cdbb3aa0963188770b54c31c8962493 12.7 KiB linux/386,linux/amd64,linux/arm/v5,linux/arm/v7,linux/arm64/v8,linux/mips64le,linux/ppc64le,linux/riscv64,linux/s390x,unknown/unknown,windows/amd64 -
docker.io/library/redis:7 application/vnd.oci.image.index.v1+json sha256:a7cee7c8178ff9b5297cb109e6240f5072cdaaafd775ce6b586c3c704b06458e 49.0 MiB linux/386,linux/amd64,linux/arm/v5,linux/arm/v7,linux/arm64/v8,linux/mips64le,linux/ppc64le,linux/s390x,unknown/unknown
root@9ecd8301374e:/# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest ac69084025c6 43 hours ago 24.4kB
redis 7 a7cee7c8178f 43 hours ago 204MB
In its current state If you run make build
and then make up
on my project you should have yourself a test cluster in docker.
I'll update the topic of this issue as we are talking about specifically docker and nomad.
Edit: I got the snapshotter working in the dind setup linked above, I just merged into main the change.
If I can help don’t hesitate to ping me, I can either help or delegate internally :)
@stenh0use a lot has changed in Nomad since the last time I touched it, a lot for the better. I was thinking if we even need Consul to make bootstrapping work? Could we not instead use the nomadService
template command together with a static rendevouz hash. That would mean that the same IPs would be returned for all of the instances of Spegel. If I understand things correctly the environment variable should update when the template value updates. Is this statement correct?
Then as you stated using a static port for the registry should be fine for the mirror to work.
I was thinking the same thing over the weekend. I do not think we should involve Consul, if we need Consul kv type functionality Nomad implemented this a few releases ago.
https://developer.hashicorp.com/nomad/api-docs/variables/variables https://developer.hashicorp.com/nomad/api-docs/variables/locks
Regarding nomadService
Nomad can inject variables into the config templates about information about the deployment. I wasn't sure if that would work as I thought in the kubernetes bootstrap code it was doing a leader election using distributed locks via leaderelection.LeaderElectionConfig.
If Spegel only needs an initial list of IPs to create the cluster and it handles all of the leader election itself then we might not need to complicate a nomad deployment leader election.
https://developer.hashicorp.com/nomad/docs/job-specification/template#change_mode
Otherwise I was looking at something like this:
https://github.com/razorpay/metro/blob/5eb8881adbf5da6d387d1f4659916c83028dfb06/pkg/leaderelection/candidate.go#L56 https://engineering.razorpay.com/leader-election-using-consul-and-golang-73580fb14463
Edit: to answer the question about template value updates, you can set a restart policy when the template changes. You can set it noop
, restart
, signal
, script
, with these the signal option in particular, you can configure what signal to send to the process.
Leader election is not actually needed. The reason it is used in Kubernetes is to make sure all nodes bootstrap with the same instance. We should be able to do the same without it using the identify protocol to distribute public keys.
I tried running Hind on Linux and I get some build issues, will have to look at why it will not build for x86 or I will just find and alternative method of running a local multi node Nomad cluster.
Ok good to know about the Leader election we can definitely pass in any the same node address on startup. I'm wondering how would bootstrapping work when a new node joins the cluster or a node fails? Can it then join the cluster based on any other node address? Given the statelessness I guess if we get into a split cluster situation we can always stop and restart the job.
That is annoying about hind, what is the error you are getting? I will spin up a linux box look into fixing it, a friend said the said to me today. I have only tested hind on my laptop which is x86 Macbook using colima 0.6.x, it also requires the docker host to be using cgroupv2.
I have a working Nomad cluster running with Vagrant now, and managed to get Spegel running without a bootstrap. My plan is to create a draft PR with the instructions and then you can have a look at it and give feedback. Would that work for you?
Thank you so much @phillebaba! Plan sounds great with the draft PR, let me know once you have that and I'll take a look.
Is there any documentation or a rough guide of how you set this up to help with dind? My use case is that I have pods that spin up a container with a dind sidecar to allow docker commands to execute in the main container Any time it has to pull an image the sidecar is new so it is pulling it direct from the web, even though the image may already be on the kubernetes node itself or on another node Is this something that spegel can help with based on the above improvements?
@phillebaba thanks for the updates here. Apologies, life has got in the way and I'm yet to test the new changes. I made a rough nomad job file to get this working a while back based off the helm chart, but need more time to incorporate the changes.
@RoryDoherty you might be better off creating a new issue. Your architecture and where the image is meant to live would need to be understood in order to answer that question.
So I tested this out on Nomad, I wasn't able to get it working using bridge networking, but I was able to get it working using host networking.
This is due to container IP address being advertised from the bootstrap server for the router to connect to. As everything is running on private addresses the peer routers can't be reached. This wouldn't be so much of a problem with overlay networking like calico and cilium. Alternatively, an option to configure an "advertised" address as well as the listen address might work? For now I think host networking should get the job done.
When using bridge network
docker exec -it hind.nomad.client.01 curl http://192.168.32.4:30738/id
/ip4/172.17.0.2/tcp/5001/p2p/12D3KooWNnh9pmRkPdYHTpDCKEQgczy2EodxSMQ6ystwJuG1eDPb
When using host network
docker exec -it hind.nomad.client.01 curl http://192.168.32.4:22143/id
/ip4/192.168.32.4/tcp/5001/p2p/12D3KooWPGnsCBBAsMNkW9Nr1idF7irR7sDtktMyv22hyR27PZao
I need to clean up my wip, but will post back here once I have a good reference. I mostly copied the helm chart but I'm still a little unsure as to how the "service" address would work in Nomad, and also what significance the "local" address has/should be configured.
For a load-balanced "service" address, consul DNS would work well, but unfortunately, you can't register a nomad job as both a consul service and a nomad service. So to do the nomadService rendevous hashing for the bootstrap node selection nomad service discovery has to be used.
Update here:
I created a repo nomad-spegel with my work. It includes 3 options for leader election, nomadService with rendevous hashing, nomad kv locking and consul kv locking, and options to use nomad or consul as a service discovery backend.
After doing a lot of testing I found nomadService
rendevous hashing more flakey than using nomad/consul kv locking. However I think I managed to get it to a stable state after adding multiple services/ports to the same service name in nomad. Originally I had each interface as a separate service (spegel-<service>
), but it caused allocations to drop in and out which meant the bootstrap addr in the template kept flapping signaling to the registry to restart frequently.
The kv/locking with consul/nomad binary works well, but perhaps it might be nice integrate the consul/nomad kv functionality as an alternative bootstrapper at a later stage. For now what I have seems to work well.
I do have some follow up questions / issues that I wasn't able to figure out.
Once the cluster is established does the cluster maintain leadership via gossip or does the bootstrap/id as something that only can be updated on startup? eg. if the cluster already exists can a peer bootstrap with any member of the cluster? I ask this as I am restarting all registries and forcing a new bootstrap process everytime the leadership changes. The benifit of this at least means that the cluster will never have split rings in the event nodes bootstrapped with different sets of hosts.
"--local-addr=192.168.112.4:25565"
"--local-addr=:25565"
doesn't listen, or at least in my testing I couldn't get it to listen.as commented above, host networking or an overlay network needs to be used due to the way the bootstrap/router ip is advertised.
Not really a question, just a statement about docker support: It looks like docker doesn't support mirroring registries other than docker.io
. I need to dig further into that, but I think I recall reading about that a while ago. I was able to confirm the issue by using nerdctl vs docker cli. The nomad logs received the same error as the docker cli.
@rumpl do you know if there are any plans to fix this issue https://github.com/moby/moby/issues/18818 as part of the containerd snapshotter work? I was able to pull non dockerhub images via spegel using nerdctl but but not with the docker daemon/cli.
After reading a bunch of PRs/issues on the moby page, it doesn't look like the mirror issue ever progressed/doesn't look like the feature is on the cards given the age of the above issue (unless @rumpl can provide any insights or updates there?).
The good news is though it's fairly straightforward to update the dockerd code to support private registry mirrors. I compiled a custom dockerd binary tonight with a change to support ghcr.io
and was able to pull non dockerhub images through spegel. I'm not super thrilled about having to patch every release, hopefully either moby can deliver on the feature or the containerd plugin for nomad gets new maintainership.
https://github.com/Roblox/nomad-driver-containerd/issues/167
Hey, I really love the simple implementation of this service, I am looking for something to back GCR / AR registries without the operational overhead of running redis and postgres and I think Spegel is exactly what I am looking for!
I'd like to extend this to non kubernetes docker clusters, would you be open to adding functionality so that Spegel can be bootstrapped without kubernetes? I had a quick look over the source code and could only see the need for kubernetes in the bootstrapping section. If I do the leg work would you be interested in working with me to integrate Consul based bootstrapping into Spegel?