spegel-org / spegel

Stateless cluster local OCI registry mirror.
MIT License
1.08k stars 55 forks source link

Support for Bottlerocket #47

Open phillebaba opened 1 year ago

phillebaba commented 1 year ago

Currently Spegel does not work with Bottlerocket when run with EKS.

https://github.com/bottlerocket-os/bottlerocket/blob/develop/README.md

This needs to be explored further to see if it is possible to support.

phillebaba commented 1 year ago

After having a quick look at the Containerd configuration generated by Bottlerocket it seems like it does not currently support Spegel.

https://github.com/bottlerocket-os/bottlerocket/blob/7b67f9118151e67ac1e0972ef9d48eb3487c4ce9/packages/containerd/containerd-config-toml_k8s_containerd_sock#L47-L52

The only mirror configuration supported currently is the old syntax. There is an issue to track implementing the new syntax. It needs to be closed before Bottlerocket can be supported. bottlerocket-os/bottlerocket/issues/1963

mballoni commented 1 year ago

I was about to test Spegel when I found this issue. Is bottlerocket team aware of this kind of limitation?

Thank you

phillebaba commented 1 year ago

@mballoni I don't know if there is any activity on the issue as it has been around since last year. I do not have time right now myself to contribute to the project so it depends on if anyone else prioritizes getting this done.

empath-nirvana commented 4 months ago

Is it possible to create a flag to disable the automatic node configuration, along with some docs for how to configure containerd on the node to make it work?

I could very easily tweak the user data on my bottlerocket nodes to have the right settings to work with spegel, I think -- the main issue is that spegel will crash because it tries to modify the settings itself and can't.

bittrance commented 4 months ago

@empath-nirvana If you are talking about spegel writing containerd mirror configuration, it is actually documented, if not as visible as one might wish. Have a looke at https://github.com/spegel-org/spegel/tree/main/charts/spegel, specifically spegel.containerdMirrorAdd.

If you succeed with getting Bottlerocket nodes to work, we'd love an update to ./docs/COMPATIBILITY.md with instructions.

empath-nirvana commented 4 months ago

Do you have some documentation on what the actual containerd configuration is supposed to be on the node? That might be useful for other OS's besides bottlerocket if you don't want spegel to configure it for you. I don't need bottlerocket-specific configuration, I can translate it to bottlerocket settings.

bittrance commented 4 months ago

@empath-nirvana Spegel has no documentation about the exact configuration generated. This is because Spegel currently targets specific scenarios (see ./docs/COMPATIBILITY.md) and Spegel is supposed to "just work" in those scenarios. Currently, the mirror config injection works something like this:

The configuration is generated by the configuration subcommand. The actual configuration generated is currently simple. It takes the list provided via spegel.registires and turns them into containerd hosts.yaml files. If you run Kind and Spegel locally (see PR #458), you can inspect the generated config:

$ docker exec kind-worker cat /etc/containerd/certs.d/ghcr.io/hosts.toml
server = 'https://ghcr.io'

[host]
[host.'http://172.19.0.4:30020']
capabilities = ['pull', 'resolve']

[host.'http://172.19.0.4:30021']
capabilities = ['pull', 'resolve']
empath-nirvana commented 4 months ago

Sorry to be a nuisance about this, but it's throwing an error message:

{"time":"2024-05-06T13:32:41.843433252Z","level":"ERROR","source":{"function":"main.main","file":"/build/main.go","line":88},"msg":"run exit with error","err":"Containerd registry config path needs to be set for mirror configuration to take effect"}

https://github.com/spegel-org/spegel/blob/490c549a3935f5dceb4a1c23b15eeb22a8ed707f/pkg/oci/containerd.go#L104-L137

Now, I haven't done any configuration on the bottlerocket node to add the mirror configuration, but it seems like that really shouldn't be throwing an error even if i haven't?

the config path is set in the args: - '--containerd-registry-config-path=/etc/containerd/certs.d'

I'm sort of digging through the code, but it looks like there's a lot more than just the initial setup that depends on that file existing, in which case it's not going to be a small amount of work to make bottlerocket support it...

empath-nirvana commented 4 months ago

I commented out the verification lines that were throwing errors and rebuilt the container, and changed the helm chart so it binds to 127.0.0.1 instead of the node ip.. and set the bottle rocket config like this:

[[settings.container-registry.mirrors]] "registry" = "gcr.io" "endpoint" = ["127.0.0.1:30020", "127.0.0.1:30021"]

And i see messages like this in the logs:

"handling mirror request from external node"

So I assume that means it did work?

bittrance commented 4 months ago

@empath-nirvana It seems you have it working then (though at this point it is hard to be completely sure since there currently is no way to see that a particular image arrived via spegel; see #24). Just so I understand fully, did you retain the check for "DiscardUnpackedLayers"?

@phillebaba I'm thinking the current Helm parameter spegel.containerdMirrorAdd should be repurposed (renamed?) so that it instructs the registry command to skip the config path check? The current implementation is somewhat odd in that it takes the config path parameter, but the only thing it is used for is matching against containerd's reported config dir. Were you planning to use it for something else?

If we go down this path, we are indirectly saying we should support containerds with the older config format. Is that problematic?

phillebaba commented 4 months ago

No the check should be run no matter if the user wants Spegel to add the mirror configuration or not. Otherwise Spegel would start and people would think that things are working when they are not at all. We will never support the deprecated mirror configuration method. It has its limitations one of them being that the configuration is not hot reloaded. The second that it will be removed from Containerd.

It is important to understand the difference between the config path configuration and mirror configuration. The config path tells Containerd to look in a directory or directories for mirror configuration to load before pulling an image. This configuration is not hot reloaded and requires a restart of Containerd to take effect. The mirror configuration exists to tell Containerd about mirrors to try before attempting to pull the image from the original source. This configuration is hot reloaded as it is updated every time Containerd pulls an image.

The issue with Bottlerocket is that they have chosen not to update the Containerd configuration yet. The issue has been open for a while without much activity. While the work is not technically complicated to change, it may require some effort to figure out a migration path for existing users. I am not going to do this work as I am not a Bottlerocket user, so unless someone else will step up this issue will be blocked.

empath-nirvana commented 4 months ago

I'm not sure why hot reloading the containerd config is necessary if you have the correct configuration in the bottlerocket node when booting it?

mballoni commented 4 months ago

Wouldn't it be necessary every time a new node arrives or gets removed from the groups?

phillebaba commented 4 months ago

In Bottlerockets case it would not be an issue as the configuration seems to be global and would be applied to all new nodes. Either way it is not something that I am going to support as the old configuration method is deprecated, and I do not have time debugging issues related to this. It would be a lot more valuable is Bottlerocket simply allowed for enabling the new configuration method instead. Or actually fixed it's mirror configuration to use the new method.

empath-nirvana commented 4 months ago

If you don't want to support it bottle rocket, that's fine. What I'm really asking for here is:

1) Documentation for what the containerd configuration needs to be (whether it's hot reloaded or not) 2) Documentation for how you can tell if it's actually working, which i believe there's already an open issue on.

Bottlerocket aside, there are going to be people who run on nodes where they won't have permissions to tweak containerd settings from a pod or don't want to, but may be able to update their node configuration in other ways.

phillebaba commented 4 months ago

The required Containerd configuration is already documented in the Compatibility docs. All Containerd system configuration requires a restart for changes to apply.

https://github.com/spegel-org/spegel/blob/main/docs/COMPATIBILITY.md#compatibility

I have not come up with a good method of determining if Spegel is working or not. It is a best effort distributed cache, there are a lot of moving parts. A solution to run after install would be the easiest but would not verify that things are working later on. A continuously running check would be better but also be a lot more complicated. Maybe starting with a run once check would solve a lot of issues for initial debugging. Happy to take contributions if you have a simple and easy method to solve this.

Well if they are able to update their node configuration in other ways it is possible to disable Spegel from adding the mirror configuration. The only thing missing is to documented how the mirror configuration should look like, which would be an improvement to the docs. Spegel will however require that Containerd is properly configured, and restarted. How that is done is up to the user. Most cloud providers will add this configuration by default while others wont. If that configuration is not possible due to the provider, it will most likely be out of scope for Spegel to fix.