Handling of default values

felixfbecker commented 6 years ago

I got diffing objects working well so far, but there is one problem: Several properties in Kubernetes objects are scalars with default values. For example:

int ProbeV1.FailureThreshold defaults to 3
string ServicePortV1.Protocol defaults to "TCP" and many more. These properties are often not declared in YAML configuration. But when deserializing the YAML, the properties get the C# type default value, instead of the server default value. I.e. FailureThreshold is always 0, Protocol is always null. When diffing against the server, this will then generate a patch that sets these values to the C# default value (0/null).

How should we handle this?

When diffing, check if the new value is the C# default value, and exclude it from the patch. This would make it impossible to actually set a field to 0 or null. Does the API have fields anywhere that can be set to the C#/Go default value? E.g. for FailureThreshold the minimum is 1, and in other places where 0 is valid I saw them use nullable/pointer types, e.g. ConfigMapVolumeSource.DefaultMode. But there are a lot of booleans that are not nullable, and this would make it impossible to switch a boolean back to false after it was set to true on the server.
Apply the server defaults client side. This would break if they ever change the default. The defaults are only exposed through documentation, not in the schema.
Make everything nullable. This would be a bad experience when querying from the server as these are always defined.
Use Dictionaries or dynamic instead of strongly typed objects for deserialisation. I would like to avoid this because then we lose the formatting and autocompletion in PowerShell, and it becomes way harder to look up the merge strategies to use for properties...

felixfbecker commented 6 years ago

I think the best we could do until Kubernetes exposes the default value through OpenAPI is to maintain a hand-maintained list of default values in a map in the generator script, and apply that default value in the property definition.

tintoy commented 6 years ago

Interesting! I wonder how kubectl / client-go handle it. I take it the OpenAPI / Swagger document doesn't list these defaults? I'll take a look at both of these ideas this morning if I get time...

felixfbecker commented 6 years ago

kubectl apply circumvents the problem by only diffing maps instead of the struct types. I thought about this but I think it would severely impact the UX so I would rather hardcode defaults. The default value is not exposed through OpenAPI but usually mentioned in the field description. I suggested adding this in the comment I linked above.

I don't know how client-go handles this

tintoy commented 6 years ago

I like your idea - let's do that.

felixfbecker commented 6 years ago

So even if we have DefaultValue attributes on everything and initialisers, it would still mean that if a YAML does not declare a field and the value is different than the default on the server, it will reset the value to the default value. I wonder if that is acceptable behaviour or not. If we always ignore default values, it would be impossible to reset a value back to the default value.

Maybe we do need to make everything with a default nullable? So that null means "not specified"/"ignore" while the actual default value means "reset". The OpenAPI spec is inconsistent about this too - some scalars are nullable in addition to having a default value, while others are not.

The question is just how inconvenient this would make using the API client, if even fields are nullable that are guaranteed to be set to a default value by the server.

felixfbecker commented 6 years ago

Maybe we need to generate two sets of classes for API responses and for inputs? Where inputs just has more optional fields, so the API responses can extend the input classes and override the fields with default values to make them non-optional.

What do you usually do in C# in such cases? In TypeScript I can distinguish between null and undefined and scalars don't have zero values (undefined is the zero value for everything).

tintoy commented 6 years ago

Hmm, not sure about yamldotnet but JSON.NET has DefaultValueHandling so if the value is some well-known scalar value (or null) the member won't be serialised. Perhaps yamldotnet has something similar?

tintoy commented 6 years ago

https://github.com/aaubry/YamlDotNet/issues/21

felixfbecker commented 6 years ago

Serialisation is not the problem, it's generating the patch (diffing yaml vs server state)

tintoy commented 6 years ago

Ah. Yeah I could apply the DefaultValue attribute and you could look for that?

tintoy commented 6 years ago

You could treat that default value as "missing" or "unspecified" perhaps?

tintoy commented 6 years ago

If the attribute is.missing th

tintoy commented 6 years ago

Man, I hate doing GitHub on my phone :)

tintoy commented 6 years ago

If attribute is missing there is a way to determine default value for type but I'd have to look it up.

tintoy commented 6 years ago

Here you go:

https://stackoverflow.com/a/353073/885866

felixfbecker commented 6 years ago

Yeah, but if you always ignore the default value, then it would become impossible to set the field to the default value again after it was modified on the server...

felixfbecker commented 6 years ago

Here's another solution I just came up with:

Generate setters for every field (and backing properties) that will mark the property as dirty when set in a HashSet that can be queried publicly. E.g.

class KubeModel {
    protected HashSet<string> dirtyProperties = new HashSet<string>();
    public IImmutableSet<string> DirtyProperties {
        get { return dirtyProperties; }
    }
}
class DeploymentSpec extends KubeModel {
    private int progressDeadlineSeconds;
    public int ProgressDeadlineSeconds {
        get { return progressDeadlineSeconds; }
        set {
            progressDeadlineSeconds = value;    
            dirtyProperties.Add("ProgressDeadlineSeconds");
        }
    }
}

It would be more code but it's generated, so doesn't really matter. This would not require changing types to nullable and also remove the need for maintaining a map of default values and even non-updateable properties (#29), because those properties would never get set by the YAML deserialisation. The diff implementation would ignore any properties that are not dirty.

The only other solution I can think of is to deserialise to Dictionaries, then build PSObjects and return those, but "lie" about the OutputType in the Cmdlet to make autocompletion still work. While building PSObjects, the "virtual" type needs to be added as a TypeName to make output formatting work. Then make the patch cmdlet convert back to dictionaries first, and make the diff implementation "track" the "virtual" model type of the dictionaries by starting at KubeResource and traversing through the Type as its traversing through the dictionaries, so it can retrieve the patch strategies etc from attributes. The disadvantage is that strictly Get-Kube* cmdlets now return a completely different type than ConvertFrom-KubeYaml.

What do you think?

tintoy commented 6 years ago

Interesting - what about just using the YAML document as-is; it only specifies fields that are specified after all :)

This is starting to get a little complex but I'm not against modifying the models in principle (or creating some sort of helper that can do the same). Let me have a think and see what I can do...

felixfbecker commented 6 years ago

What do you mean with as-is? Dictionary? We need to give it some type to serialize into

tintoy commented 6 years ago

What if we generate a second set of models in parallel to the first (in a namespace like KubeClient.Models.Tracked) which have the tracking behaviour and implicit / explicit cast operators (or AsTracked() / AsNonTracked()) to convert back and forth? The actual code for tracking changes can be improved using the [CallerMemberName] attribute (or whatever it's called).

tintoy commented 6 years ago

Not for the purposes of diff perhaps? Can't remember for yamldotnet but JSON.NET has JObject / Jtoken (a DOM for JSON, effectively). I imagine yamldotnet must have something similar? So I mean don't deserialise at all if you don't need to...

tintoy commented 6 years ago

Yep, they have a DOM for YAML:

https://github.com/aaubry/YamlDotNet/wiki/Samples.LoadingAYamlStream

felixfbecker commented 6 years ago

Yeah, but we can't operate on the YAML string ;) I think you mean what I wrote in my second idea:

The only other solution I can think of is to deserialise to Dictionaries, then build PSObjects and return those, but "lie" about the OutputType in the Cmdlet to make autocompletion still work. While building PSObjects, the "virtual" type needs to be added as a TypeName to make output formatting work. Then make the patch cmdlet convert back to dictionaries first, and make the diff implementation "track" the "virtual" model type of the dictionaries by starting at KubeResource and traversing through the Type as its traversing through the dictionaries, so it can retrieve the patch strategies etc from attributes. The disadvantage is that strictly Get-Kube* cmdlets now return a completely different type than ConvertFrom-KubeYaml.

If we go with separate class hierarchies, we'll have the same problem of fragmented types...

felixfbecker commented 6 years ago

Yep, they have a DOM for YAML

I believe the patch implementation shouldn't rely on the encoding being YAML though, so Dictionaries would be the better choice.

tintoy commented 6 years ago

If we go with separate class hierarchies, we'll have the same problem of fragmented types

A fair point.

We could, however, change the client so that its methods take / return a JObject/JToken (or a YamlDocument) instead of specific model types (i.e. no serialisation / deserialisation at all). Clients can then just use JObject.ToObject<TModel>()/JObject.ToObject(modelType) if required.

tintoy commented 6 years ago

I'm pretty comfortable with JToken and friends being passed around (it's a reasonably common way to pass untyped or semi-typed data around when that data was received in JSON format in the first place but the schema is not fixed); there's even reasonably high-fidelity conversion between YamlDocument and JSON:

https://github.com/aaubry/YamlDotNet/wiki/Samples.ConvertYamlToJson

tintoy commented 6 years ago

Seen this? https://github.com/wbish/jsondiffpatch.net

felixfbecker commented 6 years ago

I wouldn't want to have PowerShell users needing to do that though...

I have this vision of cmdlets that can all take the same strongly typed Kubernetes model objects as input and all output the same strongly typed objects. Trying to find a solution that gets as close to that as possible 🤔

Seen this? https://github.com/wbish/jsondiffpatch.net

Yeah I've seen that, it's not JSON Patch (RFC 6902) though. Kubernetes API only supports RFC 6902 JSON Patch, RFC 7386 JSON Merge Patch and their custom Strategic Merge Patch. I therefor went with JSON Patch.

tintoy commented 6 years ago

I agree that's the ideal way to go - but perhaps we might be able to simplify things a bit by keeping the 2 scenarios separate?

Consumer wants to load objects, modify them, and then patch.
Consumer wants to apply Kube YAML from a file.

For 1, this is a moot point because all fields are already populated; by definition, only fields they change will be changed (and need patch operations generated for them).

For 2, we have a set of keys from the YAML so we know what patch operations are required.

Have I missed anything, or does that more or less cover both scenarios?

tintoy commented 6 years ago

(so for 1, just do a property-level diff; all properties will have been populated before modifying)

tintoy commented 6 years ago

(as for 2, I know JSON.NET has an operation called Populate but I don't know if YamlDotNet does - populate simply deserialises on top of an existing object instance, only setting members for which serialised properties are present in the JSON - it's super useful)

tintoy commented 6 years ago

It's also possible to wrap a given model instance in a PSObject and augment / replace members that way (e.g. by adding change-tracking behaviour) although this may not be the simplest option...

tintoy commented 6 years ago

What if ConvertFrom-KubeYaml wrapped each deserialised resource model in a PSObject with an added PSNoteProperty called something like YamlKeys or LoadedYamlMembers? That way, you know which values were present in the Yaml to use as a hint for patching :-)

Mind you, if they load from YAML and then modify the resources before passing to Update-KubeResource then that might not help so much :-(

tintoy commented 6 years ago

(with the option above, the consumer always still gets real, strongly-typed models; it's just that ConvertFrom-Yaml adds extra metadata to those deserialised instances to help guide Update-KubeResource)

felixfbecker commented 6 years ago

One big use case I have is actually loading YAML into an object, mutating it (in PowerShell), then applying it (sending a patch to the API).

For example this script in CI:

build Docker images
push them to a registry
tag them with the currently built commit
read all Kubernetes YAML (in the repo)
Get-ChildItem -Recurse -File *.yml | ConvertFrom-KubeYaml
for all Deployments, modify the image fields in the PodSpecs to point to the tag of the currently built commit | ForEach-Object { if ($_ is [DeploymentV1Beta]) { $_.Spec.Containers.Image = $commit } }
apply all patches | Update-KubeResource

For writing this script, it would be tremendously helpful if I had IntelliSense on $_.Spec.Containers.Image (i.e. it's strongly typed).

I couldn't find good examples of how to use PSObject in C# to add the note property and also read it again, while still declaring types for parameters and output...

tintoy commented 6 years ago

Here's a quick example (from memory; pretty sure it'll compile but not 100% certain):

class MyModel
{
    public string Foo { get; set; }
}

class MyCmdlet
{
    public override void ProcessRecord()
    {
        MyModel model = new MyModel
        {
            Foo = "bar"
        };

        PSObject result = new PSObject(model);
        result.Properties.Add(
            new PSNoteProperty("Name", "MyModel1")
        );

        WriteObject(result);
    }
}

tintoy commented 6 years ago

Intellisense will work on the result of the Cmdlet because it really is a MyModel instance, just wrapped in a PSObject (in fact, from memory all objects are wrapped in PSObject under the covers but I might be misremembering that).

Additionally, if you decorate your Cmdlet class with[OutputType(typeof(MyModel))] or [OutputType(typeof(MyModel), ParameterSet="MyParameterSet")] then consumers will get intellisense before it's even run (e.g. when writing a pipeline).

tintoy commented 6 years ago

I have an idea for attaching properties to arbitrary objects (CLR-level, not PSObject-level) - give me an hour or so and I'll post some code :)

tintoy commented 6 years ago

Ok, forget that but how about this:

When deserialising from YAML, clone the deserialised model and store it as a PSNoteProperty (called __YamlSource or something like that) on the PSObject-wrapped model written to the pipeline. That way you can compare the model's properties with the original to see which ones have been changed! Not certain but you might even be able to mark that property as hidden...

tintoy commented 6 years ago

Can do this in C# to hide that property by default: http://ramblingcookiemonster.github.io/Decorating-Objects/

felixfbecker commented 6 years ago

Yeah I think adding the noteproperty would work, but how would you read it from a cmdlet that expects KubeResouce as a parameter?

tintoy commented 6 years ago

Bah, didn't think of that :)

Maybe have it take a PSObject and use a custom validation attribute to verify the type?

tintoy commented 6 years ago

Maybe add a parameter set that takes a different parameter of type PSObject from the pipeline?

tintoy commented 6 years ago

You can still get at the wrapped model via the BaseObject property.

felixfbecker commented 6 years ago

That could work, we would just lose the type declaration in Get-Help output with validation at runtime. That might be the trade-off we have to do.

tintoy commented 6 years ago

So just to make sure I understand how this works:

Get resource spec from YAML.
Consumer may modify that resource spec.
Get resource spec from K8s.
Decide what to do?

What if when we loaded the resource spec from YAML we deserialised it on top of the spec loaded from K8s? Then comparing the 2 might be easier...

tintoy commented 6 years ago

(props missing from YAML will be filled in from K8s and they make their changes to that model and when updating K8s all we need to do is load from K8s and compare).

Maybe if ConvertFrom-KubeYaml had a switch to enable filling in of undeclared properties from current K8s state?)

felixfbecker commented 6 years ago

Like this?

Get-Content -Raw deploy.yml | ConvertFrom-KubeYaml -OnTopOfServerState

So that would require ConvertFrom-KubeYaml to have API access, and essentially would take the job of diffing and patching, because to deserialise "on top of" the server object it has to consider all the aspects of merge strategies etc. So really it just means merging cmdlets into one. It would be nice if we could keep some SoC

tintoy commented 6 years ago

SoC?

tintoy commented 6 years ago

How about a separate Cmdlet to do merge into model from K8s state?

tintoy / dotnet-kube-client

Handling of default values #28