Closed DazWilkin closed 3 years ago
I've since learned that CRDs should be validated by dint of the OpenAPI spec...
Hmmm...In my example, I'd inadvertently added .spec.brokerPodSpec.resources
instead of .spec.brokerPodSpec.containers[0].resources
so it's unclear why this passed muster.
I've since learned that CRDs should be validated by dint of the OpenAPI spec...
Hmmm...In my example, I'd inadvertently added
.spec.brokerPodSpec.resources
instead of.spec.brokerPodSpec.containers[0].resources
so it's unclear why this passed muster.
Our Configuration CRD has some validation bits in it .. but no validation for the pod or service specs. Hopefully, someday Kubernetes will make it easier to embed (with validation) existing resources.
In the meantime, maybe we need to do something to help.
Aha!
I assumed the references were validated recursively.
I'm not particularly familiar with CRD development.
On Thu, Dec 17, 2020, 18:31 bfjelds notifications@github.com wrote:
I've since learned that CRDs should be validated by dint of the OpenAPI spec...
Hmmm...In my example, I'd inadvertently added .spec.brokerPodSpec.resources instead of .spec.brokerPodSpec.containers[0].resources so it's unclear why this passed muster.
Our Configuration CRD has some validation bits in it .. but no validation for the pod or service specs. Hopefully, someday Kubernetes will make it easier to embed (with validation) existing resources.
In the meantime, maybe we need to do something to help.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/deislabs/akri/issues/180#issuecomment-747827735, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHZTQF4TMIPCXBPBUTLCLDSVK5HPANCNFSM4VAJX6EQ .
Had a play (!) with this today:
https://github.com/DazWilkin/kubectl-akri
The concept is a kubectl plugin that would permit e.g. kubectl akri apply --filename=./something.yaml
and get errors
But, I think it's not going to be appropriate as it's turtles all the way down; I'd need to implement the entire schema tree below e.g. PodSpec.
As I drove to the dog camp, I realized that it may be sufficient after all.
I can simply validate the core Container spec and assume that e.g. PodSpec is valid. Perhaps an invalid PodSpec would be caught?
In my case, I'd inadvertently moved resources
from ... container
into the Configuration root and this type of error (incorrect Configuration entries), would be caught.
An approach that would help (but would be tied to an API version) would be to add validation to the CRD for the pod and service specs. Prometheus worked on a similar issue: https://github.com/kubernetes/kubernetes/issues/54579#issuecomment-370372942. Perhaps we could follow their lead until Kubernetes simplifies this.
Note: I also see someone suggesting using a ValidatingAdmissionWebhook ... we could add this to the controller maybe? https://stackoverflow.com/questions/56044319/validating-a-crd-with-an-embedded-core-v1-podspec
The Webhook appears (!) trivial.
Do you support my giving that a whirl?
The Webhook appears (!) trivial.
Do you support my giving that a whirl?
Definitely! Validation would add a lot of value!!
I spent some time learning more about Kubernetes' webhook today and the work is non-trivial.
I'll give it a little more time today, have another commitment tomorrow and will report back Wednesday.
Oh boy! :smiley:
K8s webhooks are poorly documented and involved but I've made progress.
I can publish a ValidatingWebhook and it gets invoked. There's a bunch of infrastructural work to get this far.
I'm slightly perplexed as I was expecting to be able to filter the webhook by e.g. apiVersion: akri.sh/v0
mutations but, I'm having to use a super-wide filter (basically *
), when kubectl apply...
'ing an Akri Configuration to get it to work. Will spend more time later this week on it.
Once I can scope the webhook down to these, I must then write the validator for an Akri Configuration (but that should be the simplest part).
It's here: https://github.com/DazWilkin/akri-webhook
OK, I got it working.... boy-oh-boy.... ValidatingWebhooks need better documentation and a blog post.
more ./zeroconf.yaml
apiVersion: akri.sh/v0
kind: Configuration
metadata:
name: zeroconf
spec:
protocol:
zeroconf:
kind: "_rust._tcp"
port: 8888
txtRecords:
project: akri
protocol: zeroconf
component: avahi-publish
capacity: 1
brokerPodSpec:
imagePullSecrets: # Container Registry secret
- name: ghcr
containers:
- name: zeroconf-broker
image: ghcr.io/dazwilkin/zeroconf-broker@sha256:69810b62...
resources: <-------------------- MISPLACED
limits:
"{{PLACEHOLDER}}": "1"
kubectl apply --filename=./zeroconf.yaml
Error from server: error when creating "./zeroconf.yaml": admission webhook "wednesday.akri.sh" denied the request: Configuration does not include `resources.limits`
NOTE The
Error
is being thrown by the webhook because the Configuration is invalid.NOTE The invalid part of the Configuration is in the PodSpec (!)
One oversight|consideration is that the incorrect (!) Configuration above is valid YAML and so, even after writing the webhook and getting it to work, the invalid Configuration passed.
What I've had to do is add manually checking for e.g. existence of .spec.container[*].resources.limits
I think an alternative would be to write a custom parser that invalidates additional (!) entries rather than permitting them.
Awesome...regarding the validation, I was mulling that over when I first read about the webhook. I wondered if we couldn't deserialize and serialized the input yaml, then use the original yaml to construct some kind of verification jsonpath-like queries. As I type it though, that sounds complex.
That's not far from what I have.
I'm currently unmarshaling it into a strict and have code to validate, but using an existing schema would be better.
I wrote the handler in Golang but it should be straightforward to rewrite in Rust. I'm more familiar with Go, the examples that exist are in Go and all the types are, by definition, available in Go.
I would love to take a look, but I got a 404 error when trying out the link to your repo. Is it public?
Apologies! It's now public
OK. It works. See testing
It currently only checks for {.spec.brokerPodSpec.containers[*].resources.limits}
and this should possibly be {.spec.brokerPodSpec.containers[*].resources.limits.\{\{PLACEHOLDER\}\}}
and then the result cast to an int.
More importantly, there should be other tests for validity but I'm not entirely sure what they should all be.
{{PLACEHOLDER}}
{.spec.brokerPodSpec.containers[*].resources}
{.spec.protocol}
I'd like to make the Webhook dynamic on a set of (flagged) templates. See: https://github.com/DazWilkin/akri-webhook/issues/4
One outstanding challenge is that JSONPath filters but it's unclear to me how to make these boolean functions.
In the above examples, I'd like: {IsInt(.spec.brokerPodSpec.containers[*].resources.limits.\{\{PLACEHOLDER\}\})}
Then, each filter
could be a Boolean function and the Webhook could find the intersection of a set of such functions to determine whether the filters succeed or fail: test-1 && test-2 && .... && test-n
that is super! rather than deciding what should be set and shouldn't (though that is a good idea!!), i was thinking something more dynamic ... just check what the user is trying to add. i tried a real general approach here: https://play.golang.org/p/10OniB-QR5u. what do you think about that?
also, is there a way to host this in the existing controller?
I explained myself poorly; set-not-set was a poor attempt to explain what you've achieved in your example.
In my code, the additional extra: 1
(as JSON) would (I think) parse because the JSON parser accepts the valid JSON and drops the superfluous extra
.
Somehow (don't yet understand your code), the back-and-forth, addresses this which is clever!
I'll add your approach.
Would you like to merge this into the controller code? Or do you mean combine with the controller Deployment?
I'm going to try to fix what I think is another issue with the WebThings Rust SDK today and then blog about Kubernetes' Webhooks because the documentation is woeful and it will be good for me to explain it (to myself if no-one else),
...But I'll be back on this tomorrow.
cool! the back and forth should allow extra: 1
to be ignored on the deserialization, serializing the resulting struct should apply any valid settings. then, using the original yaml as a "map" to walk the deserialized-then-serialized tree should allow us to pick up any missing settings (extra: 1
), mistyped (if we want) values, and unexpected set values.
my ideal would be to have this be rust code that existed in the controller binary ... we could add the service definition into the helm deployment code for the controller.
I truly love today's languages and their playgrounds ... here's a quick and dirty rust version: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f11cc741f40e72b6a40f9fa29a319783
I agree; they're awesome tools! Talking of which, I assume you're all using the alternative rust-analyzer
My plan today is to incorporate the YAML parser into the existing Golang handler.
Last night I was Googling around the k8s_openapi. IIUC the AdmissionReview
and related types aren't currently exposed through OpenAPI and so I need to get my head around rust-swagger and the swagger files referenced in that issue to generate the structs to wrap your code (thank you very much that saved me a bunch of time) into a Rust-based handler.
Yes, to merging this into the controller binary and incorporating into the Helm chart.
OK, a much improved (thank you!) version of the Golang webhook now uses the @bfjelds "twice-baked" check
I'm continuing to use JSON rather than YAML.
Now, my bad YAML example results in:
Error from server: error when creating "./zeroconf.webthings.yaml":
admission webhook "wednesday.yirella.svc" denied the request: Input index (spec) is not parsed correctly:
Input index (brokerPodSpec) is not parsed correctly:
Input index (resources) is not parsed correctly:
Input (map[limits:map[{{PLACEHOLDER}}:1]]) is not consistent with expected value
I'm missing something obvious though and recreated (!) it, the akri.sh/v0/Configuration
schema :smiley:, in Golang, I have:
package main
import (
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
type Configuration struct {
APIVersion string `json:"apiVersion"`
Kind string `json:"kind"`
Metadata *metav1.ObjectMeta `json:"metadata"`
Spec Spec `json:"spec"`
}
type Spec struct {
BrokerPodSpec *v1.PodSpec `json:"brokerPodSpec,omitempty"`
Capacity int `json:"capacity"`
ConfigurationServiceSepc *v1.ServiceSpec `json:"configurationServiceSpec,omitempty"`
InstanceServiceSpec *v1.ServiceSpec `json:"instanceServiceSpec,omitempty"`
Properties map[string]string `json:"properties,omitempty"`
Protocol map[string]interface{} `json:"protocol"`
Units string `json:"units"`
}
I used the Rust Configuration as the guide.
Q: Should these types be defined by a Swagger spec?
I see that there are additional constraints on the ServiceSpecs
of at most 1 per...
, these constraints aren't currently being enforced through the YAML.
Making good progress with the Rust variant:
https://github.com/DazWilkin/akri-webhook-rust
It's still a webhook but it's almost in Rust :smile:
Will file an issue tomorrow about externalizing types to make them more easily accessible from other projects.
Such a doofus, I spent an hour banging my head against the wall trying to understand why serde was unwilling to unmarshall my test configuration.yaml into an Akri Configuration
.
Eventually, I realized that I need to use KubeAkriConfg
:smile:
So, it's working. Building a container image and will test it.
Frustrating day... I broke the Golang implementation. I think by upgrading from v1beta
to `v1.
The Rust implementation balks on creationTimestamp
and resources
and I'm unsure why.
If I exclude them, the tests pass:
serde_json::Value::Object(o) => {
for (key, value) in o {
println!("[check] key: {}", key);
if key == "creationTimestamp" {
println!("[check] creationTimestamp deserialized: {:?}", deserialized);
return Ok(());
}
if key == "resources" {
println!("[check] resources deserialized: {:?}", deserialized);
return Ok(());
}
if let Err(e) = check(&value, &deserialized[key]) {
return Err(None.ok_or(format!(
"input key ({:?}) not equal to parsed: ({:?})",
key, e
))?);
}
}
Ok(())
}
In both cases, the deseralized
object does not contain the entities:
The missing creationTimestamp
:
Object({
"annotations":
Object({
"kubectl.kubernetes.io/last-applied-configuration":
String("{ ... }" )
}),
"generation": Number(1.0),
"name": String("webthings"),
"namespace": String("default"),
"uid": String("b61d7604-97f0-46a3-adb2-adc0dedfb0c9")
})
The missing resources
:
Object({
"containers":
Array([
Object({
"image": String("ghcr.io/..."),
"name": String("zeroconf-broker")
})
]),
"imagePullSecrets": Array([
Object({
"name": String("ghcr")
})
])
}
)
I'm unclear as to why.
Perhaps @bfjelds you have an inkling into the issue?
I've seen missing properties in what Rust kube exposes...I think I stumbled across missing create timestamps as well.
I wouldn't think that it would be a problem for this case though...seems like an odd property for someone to include in a Configuration (i would expect that to be a read only generated property).
Not sure about resources...
PodSpec resources should be exposed ... akri is accessing and modifying them here. Curious that it is not making it through the parse/unparse/reparse. I am curious at what point it is lost (is resources present after the initial typed parse? Is it in the deserialized string?)
I was having difficulties parsing the debug output; the entire tree(s) appear to be lazily evaluated by the debugger and I was unsure how to force its hand.
I'll spend more time with it this morning.
User (my) error on resources
. I was testing valid and invalid JSON at the same time and the resources
error was correctly arising from the misplaced resources
section in the (intentionally) incorrect JSON.
However, there remain issues with creationTimestamp
and managedFields
(see below) and I have a hunch it's because akri-shared uses a (really) old version of kube
(0.23.0
December 2019 ;-)) which corresponds to this commit when kube
appeared to contain its own definition of Kubernetes' ObjectMeta
:
In more recent versions, this uses k8s_openapi
implementation:
I suspect this is the cause of both issues although I'm unable to explain the *Timestamp
problem.
I'm not really familiar with ManagedFields
(though from this documentation it sounds like something we can document as unvalidated for now) ... i think that creationTimestamp
is something that people should not be setting in their Configuration. So maybe, for the sake of better-but-not-perfect, we can say that those properties are both either errors or unvalidated.
in the meantime, it seems like a worthwhile issue to point out how old kube
is.
I am unfamiliar with ManagedFields
but the section appear to be generated by the CRD|Operator process.
Edit: I just realized from your comment that
ManagedFields
are created by the Webhook process.
I'm not setting creationTimestamp
in the Configuration spec; it appears that this becomes fully 'materialized' before it's shipped to the validator.
Here's my input:
dazwilkin@akri:~$ more ./zeroconf.webthings.yaml
apiVersion: akri.sh/v0
kind: Configuration
metadata:
name: webthings
spec:
protocol:
zeroconf:
kind: "_webthing._tcp"
port: 8888
capacity: 1
brokerPodSpec:
imagePullSecrets: # Container Registry secret
- name: ghcr
containers:
- name: zeroconf-broker
image: ghcr.io/dazwilkin/zeroconf-broker@sha256:69810b62...
resources:
limits:
"{{PLACEHOLDER}}": "1"
And here's the AdmissionReview
that the Webhook receives. The Akri Configuration is .request.object
:
kind: AdmissionReview
apiVersion: admission.k8s.io/v1
request:
uid: 2b752327-a529-4ffd-b2e2-478455e80a0d
kind:
group: akri.sh
version: v0
kind: Configuration
resource:
group: akri.sh
version: v0
resource: configurations
requestKind:
group: akri.sh
version: v0
kind: Configuration
requestResource:
group: akri.sh
version: v0
resource: configurations
name: webthings
namespace: default
operation: CREATE
userInfo:
username: admin
uid: admin
groups:
- system:masters
- system:authenticated
object:
apiVersion: akri.sh/v0
kind: Configuration
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"akri.sh/v0",...'
creationTimestamp: "2020-12-30T17:47:26Z"
generation: 1
managedFields:
- apiVersion: akri.sh/v0
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
".": {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:spec:
".": {}
f:brokerPodSpec:
".": {}
f:containers: {}
f:imagePullSecrets: {}
f:capacity: {}
f:protocol:
".": {}
f:zeroconf:
".": {}
f:kind: {}
f:port: {}
manager: kubectl
operation: Update
time: "2020-12-30T17:47:26Z"
name: webthings
namespace: default
uid: b61d7604-97f0-46a3-adb2-adc0dedfb0c9
spec:
brokerPodSpec:
containers:
- image: ghcr.io/dazwilkin/zeroconf-broker@sha256:69810b62...
name: zeroconf-broker
resources:
limits:
"{{PLACEHOLDER}}": "1"
imagePullSecrets:
- name: ghcr
capacity: 1
protocol:
zeroconf:
kind: _webthing._tcp
port: 8888
oldObject:
dryRun: false
options:
kind: CreateOptions
apiVersion: meta.k8s.io/v1
my ideal would be to have this be rust code that existed in the controller binary ... we could add the service definition into the helm deployment code for the controller.
I mentioned to @kate-goldenring that I upgrade to K8s 1.20 and broke the certificate process in the Webhook. I decided this morning to try cert-manager (it is excellent). I have that working as a self-signing CA generating certs for the Webhook service; it would be trivial for others to swap our self-signing and swap-in e.g. Let's Encrypt. Using cert-manager makes the deployment process for the Webhook more streamlined (mostly Kubernetes specs) and should be straightforward to add to the Helm chart for Akri.
IIUC the Controller is not an HTTP daemon. Obviously, the Webhook is and it requires TLS. The only shared dependency with Akri is the shared
member (for KubeAkriConfig
etc.). I propose (respectfully and acknowledging your ideal) that
./webhooks/...
possibly ./webhooks/validating/configuration
?webhooks
Kubernetes Webhooks are distinct processes and out-of-band; they're registered as callbacks in API server. Their development is distinct too; the resources they use aren't part of the K8s core and may evolve out-of-sync too. I'm still taking a dependency for the Rust solution on a self-generated 'crate' from an OpenAPI spec published in an issue. The requirement for TLS adds complexity and it may be easier to keep it all out of the existing "core".
Ultimately, it's your project and I'll do as you wish.
I'm totally open to separating this from controller if it makes sense. No need to force a square peg into a round whatever :)
I appreciate your thought and energy ... am really looking forward to this!!
The Helm Chart is working: https://github.com/DazWilkin/akri-webhook-helm
Tomorrow, I'll begin merging the Rust-based Webhook and the Helm chart into Akri proper.
The Helm Chart is working: https://github.com/DazWilkin/akri-webhook-helm
Tomorrow, I'll begin merging the Rust-based Webhook and the Helm chart into Akri proper.
I wish i could give more than one thumbs up :)
Is your feature request related to a problem? Please describe.
kubectl
validates (some?) specs when these are applied.Configurations
(Akri CRD) are not validated and this permits user-error and other mistakes to escape.Is your feature request related to a way you would like Akri extended? Please describe.
During development of Zeroconf, I inadvertently broker the YAML spec for the Broker:
The result was that, while Brokers were deployed, environment variables were not mounted correctly in the Pod.
It was only after lengthy debugging that we uncovered the error:
This may (!?) have been mitigated were the Configuration validated upon apply.
Describe the solution you'd like
kubectl apply --filename=./some-akri-config.yaml
Would yield e.g.:
Describe alternatives you've considered
Being less stupid ;-)
This is the convention with Kubernetes and would be a useful addition.
Additional context
See: https://github.com/deislabs/akri/issues/177