Closed lasseebert closed 7 years ago
From the top of my head, the most important decision is the data structure of a schema.
I have worked with libraries that use modules and metaprogramming to define similar things. So you would be able to do something like:
defmodule MySchema do
use Validation
required(:name) |> filled
required(:email) |> filled |> format(~r/@/)
end
MySchema.call(params)
# => Some result structure
This is hard to work with in my experience. As soon as something should be reused in another context, one have to know implementation details.
I think a better approach is to have a schema as pure data build with helper functions. So something like:
import Validation.DSL
my_schema = Validation.schema(
[
required(:name) |> filled,
required(:name) |> filled |> format(~r/@/)
]
)
# => Some schema data structure
Validation.result(params, my_schema)
# => Some result structure
The first thing I wanted to figure out is how to implement predicate logic, which is the heart of dry-validation, in Elixir. The model in dry-validation is functional so I would expect it to be easily achievable in Elixir.
Essentially we need to be able to compose functions using predicate logic operators
I think this is doable with macros.
Something like this:
defmodule Validation do
defmacro required(field, predicates \\ nil) do
quote do
{
:required,
unquote(field),
Validation.process_predicates(unquote(predicates))
}
end
end
defmacro process_predicates(nil) do
nil
end
defmacro process_predicates({:and, _, [left, right]}) do
quote do
{
:and,
Validation.process_predicates(unquote(left)),
Validation.process_predicates(unquote(right)),
}
end
end
defmacro process_predicates({name, _, nil}) do
name
end
defmacro process_predicates({name, _, args}) do
quote do
{
unquote(name),
unquote(args)
}
end
end
end
With this we can:
$ iex -S mix
Erlang/OTP 19 [erts-8.1] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]
Compiling 1 file (.ex)
Interactive Elixir (1.3.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> import Validation
Validation
iex(2)> required(:email)
{:required, :email, nil}
iex(3)> required(:email, filled?)
{:required, :email, :filled?}
iex(4)> required(:email, filled? and match(~r/@/))
{:required, :email, {:and, :filled?, {:match, [~r/@/]}}}
Ah now we're talking :)
So, having that data structure that defines rules, do you know how we could translate it into a composed function?
I think it's just a matter of pattern matching the predicates and let :and
handle the composing.
E.g. if our schema is %{email: {:required, {:and, :filled?, {:match?, [~r/@/]}}}}
, we could do something like this:
def result(params, schema) do
errors = params
|> Enum.reduce([], fn {key, value}, errors ->
{:required, predicate} = Map.get(schema, key)
case validate(predicate, key, value) do
:ok -> errors
{:error, message} -> [message | errors]
end
end)
%{data: params, errors: errors}
end
def validate(nil, _key, _value) do
# No predicates
:ok
end
def validate({:and, left, right}, key, value) do
with :ok <- validate(left, key, value) do
validate(right, key, value)
end
end
def validate(:filled?, key, nil) do
{:error, "#{key} must be filled"}
end
def validate(:filled?, _key, _value) do
:ok
end
def validate({:match?, [pattern]}, key, value) do
case value =~ pattern do
true -> :ok
false -> {:error, "#{key} must match pattern #{pattern |> inspect}"}
end
end
Note that I throw away the :required
in this example. I only validate the predicates.
iex(1)> schema = %{email: {:required, {:and, :filled?, {:match?, [~r/@/]}}}}
%{email: {:required, {:and, :filled?, {:match?, [~r/@/]}}}}
iex(2)> Validation.result(%{email: nil}, schema)
%{data: %{email: nil}, errors: ["email must be filled"]}
iex(3)> Validation.result(%{email: "foo"}, schema)
%{data: %{email: "foo"}, errors: ["email must match pattern ~r/@/"]}
iex(4)> Validation.result(%{email: "me@example.com"}, schema)
%{data: %{email: "me@example.com"}, errors: []}
I will investigate how dry-validation is build and then make some proof-of-concept code ;)
That might take a while since the most complex part of dry-v is...the DSL itself; however, conceptually, the whole thing is very simple. First of all dry-v doesn't do much when it comes to validation logic itself, as this is provided by dry-logic gem. This gem uses its own AST format to describe operations and predicates. There are common logic operations like AND or XOR but there are also operations used for extracting data or applying predicates in more complex ways (ie a set operation, which applies a set of predicates to its input).
What dry-v is doing via its DSL, is generating AST, which is compiled to dry-logic operations with predicates, then it just applies these operations to input.
Here's a very simple :key
operation, which extracts value from a hash under specified key, and applies its predicates:
irb(main):012:0> compiler = Dry::Logic::RuleCompiler.new(Dry::Logic::Predicates)
=> #<Dry::Logic::RuleCompiler:0x007fc8fe0dce78 @predicates=Dry::Logic::Predicates>
irb(main):013:0> ast = [:key, [:email, [:predicate, [:filled?, [[:input, Undefined]]]]]]
=> [:key, [:email, [:predicate, [:filled?, [[:input, Undefined]]]]]]
irb(main):014:0> rule = compiler.visit(ast)
=> #<Dry::Logic::Operations::Key rules=[#<Dry::Logic::Rule::Predicate predicate=#<Method: Module(Dry::Logic::Predicates::Methods)#filled?> options={:args=>[]}>] options={:name=>:email, :evaluator=>#<Dry::Logic::Evaluator::Key path=[:email]>, :path=>:email}>
irb(main):015:0> rule.(email: '')
=> #<Dry::Logic::Result:0x007fc8fc9a9cd8 @success=false, @id=:email, @serializer=#<Proc:0x007fc8fc9a9c60@/Users/solnic/Workspace/dry-rb/dry-logic/lib/dry/logic/operations/key.rb:44>>
irb(main):016:0> rule.(email: '').to_ast
=> [:failure, [:email, [:key, [:email, [:predicate, [:filled?, [[:input, ""]]]]]]]]
Now here's the really good part - in Elixir we don't need that whole dry-logic infrastructure, as the only reason why it exists is to provide the ability to compose callable objects (think functions) in a logical way. So ie filled & min_size?(18)
in dry-v DSL is translated to dry-logic AST, then compiled into AND
operation. In Elixir, we just need predicate functions and a way to compose them using logic operators, and we're done :)
Apart from validation logic, rules and operations, we need to deal with error messages - and that has been so far the most challenging part, but let's look into this later maybe, once we have predicate functions and composition done.
Thanks for the comprehensive description. It will surely help me trying to find a place to start :)
I agree that we should wait with error messages or at most have simple and non-configurable error messages until we have a more complete validation engine.
I also think coercion and nested schemas can wait a bit.
I'm aiming for something like this to start with:
schema = Validation.schema(
[
required(:name, filled and string),
required(:email, filled and string and match(~r/@/))
]
)
# => Some AST wrapped in a %Schema{}. Perhaps a Map of ASTs.
# Later we can define shorthands for `filled and string and match()` like the
# macros in dry-v
params = %{"name" => "Me", "email" => "me@example.com", "foo" => "bar"}
result = params |> Validation.result(schema) # => A %Result{}
result.errors # => %{}
result.valid? # => true
result.data # => %{name: "Me", email: "me@example.com"}
params = %{"name" => "Me", "email" => "not an email"}
result = params |> Validation.result(schema)
result.errors # => %{email: ["is invalid"]}
result.valid? # => false
result.data # => %{name: "Me", email: "not an email"}
This looks like a great starting point! Are you OK with TDDing it? This project is an opportunity for me to learn Elixir better, and TDD is an important part of my workflow so I would love to see tests and implementation later, if that's OK with you :)
I'm not just ok with TDDing. I can't really code without it :)
TDD ALL THE THINGS (insert All The Things meme image here)
Fantastic! 🎉 Please feel free to go ahead and start hacking on it. I'll be chiming in and at some point I should be able to help with coding too (I need to refresh my elixir memory first :))
I came here from Twitter. I was thinking about doing something similar for quite some time now.
I would have only one advice - don't start with macros. Start with plain functions and define data types that they are going to work on (how to encode various types and validations). Introduce macros as the last step to remove verbose declarations (if needed).
Thanks @michalmuskala. This is an excellent advise.
I will try to refactor the existing code to not use macros, or at least have a pleasant interface without macros.
I also came here via Twitter and would like to give my two cents if I may.
There seem to be two obvious solutions besides the proposed compilation to a predicate logic expression, that is then turned into a function.
I have the impression it could be simpler.
What you typically do is model a structure that can represent the outcome of the validation and some data ( e.g. the error messages). You want to make sure that the functions producing this result are composable such that errors are accumulated and the overall outcome is :error
if any of the validation failed.
In other languages you would use something like applicative functors or even monads for this.
The simplest solution is probably returning a tuple {:ok | :error, Map.t }
. Then fold over the validation functions and combine the result according to some simple rule.
The validators could be higher order functions that accept the thing to validate and the map of errors seen so far.
Instead of using functions, have your validators return datastructures and then use an interpreter function that applies the semantics represented by the structures. That is basically a fold over the set of structures accumulating errors.
Then return :ok
if the map of errors is empty.
I can only second that it is advisable not to start with macros. Macros are hard to write and, in contrast to functions, do not compose.
Also they live in the strange world of the compile time (actually macro expansion time) and any non trivial macro can be hard to comprehend. Let alone issues due to phasing.
Oh and of course there are possibly many more and better ways to do this. I am looking forward to your actual solution. Have fun and happy hacking 😊
Thank you very much @certainty, for taking the time to give "your two cents" :)
I really like the idea of composable functions rather than one validate
function that applies a given set of predicates and builds the result.
On a side note: I think we need both error messages and the resulting params map in the result data structure, since validation rules work as a whitelist and possibly also coercer of the input.
If I read your suggestion correctly we have these parts of the engine:
true
or false
. E.g. filled?(value)
or match?(value, pattern)
and
, which is also a predicate.There is a distinction of key validation and value validation. A key can be required or optional and values can be validated with predicates. These two concepts can probably also just be a composition of functions.
Again, thanks! :)
@lasseebert that seems to be sound.
I have implemented a little combinator based validation library that allows to validate elixir data.
That is akin to your suggestion to make primitive validation functions and combinators to build new validators out of existing ones.
With regards to key and value validation. Is there a distinction needed? I mean you could have a validator that asserts that the value for that key is there, which means it that is required and optional otherwise, right?
Maybe it is worthwhile to build a little spike that implements the basic API as you suggested. That would allow to evaluate the idea before you dive directly into the proper implementation. Also it could serve as a basis for a more concrete discussion :)
I am curious to see this library evolve.
Update: please forgive my ignorance. I only just noticed you already began to code. So I already have something to look at.
Yes @certainty, you are right that a rule can just validate either some key or some value. Or both or a combination of things. :)
I will start small and build the most basic features of the library with no respect for the public API, but with focus on composition of predicates and rules. Then after that make the public API nice to work with.
Hi, I came here from Twitter. I was looking for a validation library while ago and found https://github.com/CargoSense/vex. This is something very similar to what you want to get.
schema = [username: [length: [min: 2]]
Vex.valid?(%{username: "fazibear"}, schema)
Vex also supports more fancy syntax.
Hi @fazibear. Yes I also looked at Vex while searching for a validation library. I skipped it because it seemed somewhat limited. AFAICS it has no support for nested validations and custom validations.
@fazibear I liked that Vex is simple and it looks like it is easy to reuse schemas or part of schemas. But as mentioned, it seemed to have too little functionality to suit my needs.
Thanks for sharing though ;)
Would be cool to be able to serialize the schema to reuse validations on the client side.
Hi @andre1sk. I'm sorry to say that serializing will probably be hard to do, since we just defined a schema to be a pure function. With some added metadata it might be possible, but I think it will not be build into the library to start with. Thanks for the input :)
Functions all the way vs Use datastructures and interpret them
What about data structures translated to composed functions? That's how it works in dry-validation
Would be cool to be able to serialize the schema to reuse validations on the client side.
We did that with dry-validation and formalist and very quickly realized that it's tricky because it's common to have validations that rely on backend too much, and it's not possible to translate them directly to client-side code. I'm still not sure about the value of this, it's relatively easy to support translation of simple value-based checks when you have an abstract representation of such checks (and dry-validation has that) but then you need to be able to drop validations that are not "portable", which is actually quite tricky.
We did that with dry-validation and formalist and very quickly realized that it's tricky because it's common to have validations that rely on backend too much, and it's not possible to translate them directly to client-side code. I'm still not sure about the value of this, it's relatively easy to support translation of simple value-based checks when you have an abstract representation of such checks (and dry-validation has that) but then you need to be able to drop validations that are not "portable", which is actually quite tricky.
Ecto solves this by allowing each validation to store metadata in the changeset about itself. Only the validations that registered the metadata can be serialised, the rest is opaque and considered backend-only. By default phoenix uses this metadata to do HTML5 validation in the forms.
It's a good point @solnic
As I see it there are advantages of both approaches:
With a data structure we get access to metadata about a schema at runtime.
With a schema being just a function composed of other functions, we get simplicity and flexibility. It is straight forward to add a custom rule or a custom predicate, since they are just functions. The entire schema can even be swapped out with a custom function if needed.
Maybe find a way to have the cake and eat it too? A schema being a pure function but somehow be able to access or create metadata from it. Something like this, although I don't like this particular approach much:
str? = fn
:metadata -> {:meta, [name: :str?, type: :predicate]}
value when is_binary(value) -> :ok
_ -> {:error, "must be a string"}
end
Rules and schemas then just accumulate metadata from their composed functions.
After some thought, I like @solnic's idea of a schema as a nested data structure that compiles into a composed function.
I really liked the simplicity of "just functions", but I think it will get complicated quickly when adding features like i18n for error messages, auto-generated documentation or whatever else we or a third party library could think of.
@solnic I'm still not sure about the value of this, it's relatively easy to support translation of simple value-based checks when you have an abstract representation of such checks (and dry-validation has that) but then you need to be able to drop validations that are not "portable", which is actually quite tricky.
Since the idea is back to a schema wouldn't having some flag to not export a given rule be enough? Also optionally some syntax to delegate the check(s) to an API call?
@andre1sk: If we end up using a schema datastructure, then yes, it is probably possible to serialize the schema to be used in documentation, frontend validation, and whatever else one could think of. IMO the serialization should be general and not aimed for frontend validation.
@lasseebert IMO the serialization should be general and not aimed for frontend validation.
I think you are totally right, but it does seem there should be some way to add "metadata" that can be used to adapt the schema to particular use case e.g. to optionally provide some fe export specific flags or documentation related data
I see you had some good discussions. I second the advantages of having an abstract representation of the validation. Especially if you aim for documentation, serialization.
One could also think about generating e.g. a JSON schema, in cases the validation is used for JSON data.
Also it serves as a good basis to experiment with different implementations of the interpretation of that structure. Now the result of the interpretation would be a composed function. In a similar vein other approaches can be tried out, should that be needed.
I think I have a good understanding now on how to achieve the composed data structure that compiled into a (composed) function.
I have begun hacking on it and will commit some code soon that supports the datastructure of predicates. I'm aiming for a basic building block, which represent the most basic predicate (think filled?
), then compose those into composed predicates with e.g. and
and or
.
I'll close this initial issue now. Any discussion about a specific aspect of validation
deserves more attention in an issue of it's own :)
We should describe the usage of this library. E.g. how to define validations, validate data structures, get error messages, etc.