infrahq / infra

Infra provides authentication and access management to servers and Kubernetes clusters.
https://infrahq.com
Other
1.38k stars 55 forks source link

api: stricter field validation #1763

Closed jmorganca closed 2 years ago

jmorganca commented 2 years ago

Quite a few API fields don't trim or validate against leading & trailing whitespace:

This results in api objects being created with fields with leading whitespace. We should trim whitespace for these fields in our API (or at least in the client)

dnephin commented 2 years ago

I've looked into this a bit. Today we use go-playground/validator for api request validation. It uses struct tags to define the validation. This works well enough in some cases, but doesn't work well for applying a consistent set of validators to similar fields (ex: we want the same validation for all name fields).

The options it provides are:

It doesn't seem like there is a good way to do this with our existing library.

dnephin commented 2 years ago

I believe our requirements here are:

dnephin commented 2 years ago

One option (suggested by @ssoroka ) is go-ozzo/ozzo-validation/v4.

That library appears to produce nice errors, and would allow us to group validation together in a few ways. Either by implementing a Validate method, or by putting the rules into a function.

The only challenge I can see with this library would be inspecting the validation rules in the openapi doc generation. The validation is declarative, but the fields on the rules are not exported. We can pass the list of validation.FieldRules to the openapi doc generation, and it can match the rules to the fields on the struct, but we need some way of translating a Rule into the openapi Schema objects.

There hasn't been any updates to the library in over 2 years, so I'm not sure if it's still actively maintained, or willing to accept contributions. If we are ok with forking it, we would have 2 options:

  1. Export the fields on all the rules so that we can inspect them
  2. Add a method to the library that accepts the rules, builds a representation of the validation requirements, and returns that representation.
dnephin commented 2 years ago

I really like the idea of not using struct tags for this validation. There's nothing particularly wrong with using struct tags, but it does force us to use reflection to receive them. Using struct tags for something that already uses reflection (encoding/json or mapstructure) is great because reflection is necessary anyway. But validation doesn't require reflection, so we could avoid it.

Avoiding reflection likely makes the code more obvious (easier to trace, doesn't require any understanding of reflection), and also more performant.

Today we use the validate struct tag in 104 places. Using git grep -E -oh 'validate:"[^"]*"' | sort | uniq to track down all the uses, it seems like:

I wonder how much value we really get from using a validation library. Writing a validator for these cases is pretty easy, and we may even be able to avoid reflection by using generics. The libraries do give us more advanced validators like email, but I'm not sure how much we should trust a random library to validate those properly. Likely we'll want to audit those sufficiently that copying them into our repo isn't much more work.

I think we have a couple options for approaches we could take. Both could leverage a library like ozzo-validation, without relying on it as our primary interface.

Option 1 would be to continue to use the struct tags we have, but replace go-playground/validator with a small library that uses mitchellh/reflectwalk to read all the tags. We already use reflectwalk in cliopts. The validation itself could still be done with a library, if we wanted to.

Option 2, which isn't necessarily mutually exclusive, would be to define our own interface for validation. Something like this:

type Validator interface {
    Validate() error
    Describe(schema *openapi3.Schema) error
}

This interface should make it easy to keep the validation rules next to the logic that translates those rules into the openapi schema objects. Each request struct would implement this interface. The implementation could rely on the struct tags, or it could use ozzo-validation. The important part is that we define the interface, and that it keeps the validation and description of the validation together.

Each request type shouldn't need to implement much. Using a library like ozzo-validation all they would need to do is define the list of rules. Pass that list of rules to Validate(rules) to implement Validate, and pass it to another function to implement Describe.

ssoroka commented 2 years ago

mostly our lack of use of validation today is because it hides in tags and it's not obvious how to use tags for validation (eg What functions are even supported?). I'm not a fan of tag-based validation either.

reflectwalk looks interesting, but it's not obvious how to use it from the tests, and there are no docs. looking at cliopts, I don't find that code easy to read or understand.

What I've done in the past is have the validation package build the rules per field up into an internal structure that you could ask for after running the validation. This is nice in that the developer only needs to write the validation and not worry about how that translates into openapi, and the translation code can be written once in a single place to translate those rules into openapi. This sounds a bit like your option to expose the rules from the ozzo package, so I'd be interested in investigating that.

dnephin commented 2 years ago

What I've done in the past is have the validation package build the rules per field up into an internal structure that you could ask for after running the validation.

Ya, this sounds like what I was thinking for option 2. This would be a nice approach.

I realized the one nice property of using reflection is that we can look up the field name for errors without having to repeat it. I'm not sure if the cost of reflection is worth it for just looking up the field name, but something to consider.

ssoroka commented 2 years ago

Hmm. Ideally the validation solution should know the real field name specifically for the errors.