Here are some thoughts about parsing, validation, and producing usable error messages.
(NOTE: I'm using unmarshal/parse interchangeably)
Built-in marshalling:
Pros
free error messages about "can't marshal string into struct" etc
Cons:
in general, can't parse polymorphic fields (e.g. volume source, string-or-struct)
Custom marshalling:
Pros
can parse polymorphic fields (e.g. volume source, string-or-struct)
Cons
less-generic syntax means successful marshalling is more dependent on having a valid file
freedom to define types where errors about an unmarshalled value may not easily translate to a location in the file
What's the alternative to custom marshalling?
Only use types that are supported by built-in marshalling. Two options for this:
Don't parse to a complete type. e.g. use string or map[string]interface{} or StringOrMap
This placeholder type will still need to be parsed before any validation or conversion can be done.
Only define syntax that built-in marshalling will support.
Less freedom in defining syntax. Many types will look more like native Kubernetes syntax.
How do you get good error messages?
How do we separate parsing and validation?
In general, parsing and validation errors are only separate concerns where invalid input can be successfully parsed.
Even built-in marshalling does some validation--it refuses to unmarshal a string into a map.
It's not black-and-white what validation is a necessary part of parsing, and what should be done after parsing.
It depends on the granularity of the types being parsed into.
For example, StringOrMap does less validation during parsing than a more specific type, even though both need the same validation overall (if they encode the same information).
How do we make the judgement call on how granular our types should be?
I propose that we define sufficiently granular types that kube-conversion code doesn't need to do any parsing.
In isolation and taken to the extreme, this suggests parsing directly into native Kubernetes types.
However, there are other design concerns as well.
How do we present validation errors to the user?
Users don't know about the types used to implement Short syntax.
We need to be able to connect an error in a Go object to a segment of text in the user's input.
Kubernetes syntax doesn't do anything interesting, so errors about the Go type are already easy
to interpret as errors about the text file.
Errors returned by custom parsing code contain enough information to track down the error in the original file.
However, the current error-message implementation doesn't format the message well, and the json library doesn't help either.
We should probably write our own version of the json library with clearer error messages.
We should also implement a mini-library for structured errors.
Errors from post-parse validation currently can't rely on the marshalling code to indicate to the user
which part of their input is invalid.
i.e. If we have an invalid Go object, there isn't an existing way to tell the user which part of their file is wrong.
It's possible that we could replace the invalid portion of the Go object with an error message and marshal the result to text.
Implementing this functionality could involve writing our own version of the json library.
An alternate (possibly overlapping) approach could be to use a SomeTypeOrError anywhere we use custom marshalling for a SomeType.
Another problem is that built-in marshalling doesn't treat extra (or missing) fields as an error.
If a user mistypes a field name, they might not see an error.
Fixing this problem probably involves writing our own version of the json library.
Here are some thoughts about parsing, validation, and producing usable error messages.
(NOTE: I'm using unmarshal/parse interchangeably)
Built-in marshalling:
Pros
Cons:
Custom marshalling:
Pros
Cons
What's the alternative to custom marshalling?
Only use types that are supported by built-in marshalling. Two options for this:
How do you get good error messages?
How do we separate parsing and validation?
In general, parsing and validation errors are only separate concerns where invalid input can be successfully parsed. Even built-in marshalling does some validation--it refuses to unmarshal a string into a map.
It's not black-and-white what validation is a necessary part of parsing, and what should be done after parsing. It depends on the granularity of the types being parsed into.
For example, StringOrMap does less validation during parsing than a more specific type, even though both need the same validation overall (if they encode the same information).
How do we make the judgement call on how granular our types should be?
I propose that we define sufficiently granular types that kube-conversion code doesn't need to do any parsing. In isolation and taken to the extreme, this suggests parsing directly into native Kubernetes types. However, there are other design concerns as well.
How do we present validation errors to the user?
Users don't know about the types used to implement Short syntax. We need to be able to connect an error in a Go object to a segment of text in the user's input.
Kubernetes syntax doesn't do anything interesting, so errors about the Go type are already easy to interpret as errors about the text file.
Errors returned by custom parsing code contain enough information to track down the error in the original file. However, the current error-message implementation doesn't format the message well, and the json library doesn't help either. We should probably write our own version of the json library with clearer error messages. We should also implement a mini-library for structured errors.
Errors from post-parse validation currently can't rely on the marshalling code to indicate to the user which part of their input is invalid. i.e. If we have an invalid Go object, there isn't an existing way to tell the user which part of their file is wrong. It's possible that we could replace the invalid portion of the Go object with an error message and marshal the result to text. Implementing this functionality could involve writing our own version of the json library. An alternate (possibly overlapping) approach could be to use a SomeTypeOrError anywhere we use custom marshalling for a SomeType.
Another problem is that built-in marshalling doesn't treat extra (or missing) fields as an error. If a user mistypes a field name, they might not see an error. Fixing this problem probably involves writing our own version of the json library.