keleshev / schema

Schema validation just got Pythonic
MIT License
2.88k stars 215 forks source link

Ability to Compose Schemas Together #168

Open rmorshea opened 6 years ago

rmorshea commented 6 years ago

It would be great if it were possible to compose two schemas together:

s1 = Schema({"a": {"b": int}})
s2 = Schema({"a": {"c": int}})

s1_and_s2 = compose(s1, s2)

assert s1_and_s2 == Schema({"a": {"b": int, "c": int}})

or

s1 = Schema(str)
is_lowercase = lambda s: s.lower() == s
s2 = Schema(is_lowercase)

s1_and_s2 = compose(s1, s2)

assert s1_and_s2 == Schema(And(str, is_lowercase))

If such a proposal is reasonable and possible I am willing to create a PR. Suggestions are welcome!

skorokithakis commented 6 years ago

Hmm, this is interesting. How would it work for disparate types, such as a list and a dict, or an int and a string?

rmorshea commented 6 years ago

@skorokithakis I think the composition of schema types could be handled in three cases:

  1. All are dictionaries.
    • Merge together into one dictionary schema where shared keys are recursively composed.
  2. All are iterables.
    • Merge into a single iterable schema in the order they were passed into compose.
  3. There is a combination of different schema types.
    • A type error is raised if dictionary and iterable type schemas are composed together
    • Any other combination of schema types are merged by a callable of the form function(*args) (And by default).
rmorshea commented 6 years ago

@skorokithakis are you planning to support Python 2.7 in future releases?

skorokithakis commented 6 years ago

That's up to @keleshev, although my preference would be to stop supporting it sooner rather than later. About this PR, I'm worried that the result would be a bit too hard for people to understand. For example, there's nothing symmetric about chaining schemas of iterables together and merging keys in dictionaries (not to mention that dictionaries are iterables too)...

rmorshea commented 6 years ago

@skorokithakis for clarity, when I referred to "iterables" I meant it in the way that _priority defines it. To be more precise I'll call them collections now. Likewise I'll refer to dictionaries as mappings. When I talk about these classifications I mean them to be exclusive (i.e. an object is a mapping or collection, but not both).

back to business...

I would agree that the handling of collections is up for debate. I'm not sure whether they should be merged into one collection, or whether they should be passed into a join_collections function which should merge them in whatever way the user decides (where the default behavior would be the former).

With that said, I think the the following behaviors are relatively intuitive:

  1. Merging mappings by recursively composing shared keys.
  2. Composing different types (e.g. a type and callable, or a dict and callable) via a join(*schemas) function which defaults to And.
skorokithakis commented 6 years ago

Hmm, yes, it's certainly better, but I'm worried about the lack of consistency between 1 and 2 (they basically do completely different things. Also, 2 is rather more convoluted than what we have now, where you can just And two schemas anyway, so the main benefit of this is a function that merges the keys in N collections.

That does seem useful, but I'm worried that it's possibly not useful enough to have as a core piece of the library... What are your thoughts on this, @rmorshea? I'm not entirely certain myself.

rmorshea commented 6 years ago

@skorokithakis I’m pretty confident that the ability to recursively merge keys is important for building more complex systems of validation.

For example, consider my present use case...

I must validate JSON responses from a server. All of the possible responses have nested data. Furthermore all the responses share a common set of nested fields. Currently there is no way to create a base schema which can be extended into all the possible response cases.

My particular use case seems like it would be pretty common.

skorokithakis commented 6 years ago

Hmm, true. Maybe a better approach would be a way to "include" a Schema collection's keys into another Schema?

rmorshea commented 6 years ago

I don’t think that would work in my use case:

I have a common field “data” which I know is a string. However in my extension I would like to be able to specify that this field is a string of a particular form. To accomplish this I would want to merge the common type specification and my custom validator under an And operator.

I’ll see if I can come up with some specific example when I get home.

skorokithakis commented 6 years ago

Ah, I see what you mean. Yes, what you are describing is a specific extension of the schema, which I agree is valuable, but I don't think the compose method is the best way to do it... What you are describing isn't just a straightforward way to compose two schemas, but it also contains a rather opinionated method for doing that, out of all the alternatives. I wonder if there's a lower-level way we could achieve the same result with more flexibility...

rmorshea commented 6 years ago

So long as uses have the ability to customize the logic, I don't think there's much harm in having opinionated default behavior if that default behavior is intuitive.

I'm also not really sure what you mean by "lower-level". Could you given an example, or describe this further?

In the end, I need to be able to use something like compose otherwise I'll have to use a library like marshmallow because it enables this kind of extension/composition via inheritance. I would prefer to use schema though, and I think that compose would simplify much of what I would otherwise have to do with marshmallow.

rmorshea commented 6 years ago

By "lower-level" do you mean that users ought to have "finer" control over the merging behavior?

Or are you imagining that this logic could be more deeply embedded such that you could add schemas?

s1 = Schema({"a": {"b": int}})
s2 = Schema({"a": {"c": int}})
s1_and_s2 = s1 + s2
rmorshea commented 6 years ago

@skorokithakis I think the following solution is pretty clever.

What if schema composition were handled in two cases:

  1. All are dictionaries.
    • Merge together into one dictionary schema where shared keys are recursively composed.
  2. There is a combination of different schema types.
    • By default pick the first value where order is based compose on order passed to compose.
    • To override the default users can provide an optional reduce=<function or validator>:
      • function: a callable of the form reduce(schemas) where schemas is a list
      • validator: an object with a callable validate(schema) attribute.

The default behavior of 2 is not opinionated, and the optional reducer is infinitely extensible. It also makes it possible for the schema library to develop builtin reducers as people discover useful patterns and suggest that they be added.

s1 = Schema({"a": int, "b": int})
s2 = Schema({"b": float})
s3 = compose(s2, s1) # note choice of order

assert s3.is_valid({"a": 1, "b":, 2.0})
s1 = Schema(str)
s2 = Schema(lambda s: s.lower() == s)
and_reducer = Use(lambda schemas: And(*schemas))
s3 = compose(s1, s2, reduce=and_reducer)

assert s3.validate("hello world!")
assert not s3.is_valid("Hello World!")
import functools, operator

s1 = Schema({"a": [int], "b": str})
s2 = Schema({"a": [float], "b": lambda s: s.lower() == s})

join_lists = lambda schemas: functools.reduce(operator.add, schemas, [])
list_reducer = And([list], Use(join_lists))
reducer = Or(list_reducer, and_reducer)

s3 = compose(s1, s2, reduce=reducer)
assert s3.is_valid({"a": [1, 2.0], "b": "hello world"})

There's probably a way to make developing reducers easier, but this seems really powerful!

tadeoiiit commented 5 years ago

Hey guys, are we doing this? Would the implementation allow for nested schemas as well (that's the feature I need)?

skorokithakis commented 5 years ago

Sorry, I just now noticed that I haven't replied to this. I will address this shortly.

skorokithakis commented 5 years ago

By "lower-level" do you mean that users ought to have "finer" control over the merging behavior?

@rmorshea Yes, basically I am worried that composing right now gives no control to the user, it is a set of pre-written rules for how things will be composed and that's it. If they don't fit the user's use case, there isn't much they can do about it.

skorokithakis commented 5 years ago

There's probably a way to make developing reducers easier, but this seems really powerful!

This does seem powerful, I like it! I think it's nearly there, my only worry is that we need to break down the rules a bit further. For example, what happens if we're composing:

We can possibly just throw errors for most of these, or pick the first, or let the user specify a composer, I'm just looking to better understand how this would work.