Define usage - Githubissues

lasseebert commented 7 years ago

We should describe the usage of this library. E.g. how to define validations, validate data structures, get error messages, etc.

lasseebert commented 7 years ago

From the top of my head, the most important decision is the data structure of a schema.

I have worked with libraries that use modules and metaprogramming to define similar things. So you would be able to do something like:

defmodule MySchema do
  use Validation

  required(:name) |> filled
  required(:email) |> filled |> format(~r/@/)
end

MySchema.call(params)
# => Some result structure

This is hard to work with in my experience. As soon as something should be reused in another context, one have to know implementation details.

I think a better approach is to have a schema as pure data build with helper functions. So something like:

import Validation.DSL
my_schema = Validation.schema(
  [
    required(:name) |> filled,
    required(:name) |> filled |> format(~r/@/)
  ]
)
# => Some schema data structure

Validation.result(params, my_schema)
# => Some result structure

solnic commented 7 years ago

The first thing I wanted to figure out is how to implement predicate logic, which is the heart of dry-validation, in Elixir. The model in dry-validation is functional so I would expect it to be easily achievable in Elixir.

Essentially we need to be able to compose functions using predicate logic operators

lasseebert commented 7 years ago

I think this is doable with macros.

Something like this:

defmodule Validation do
  defmacro required(field, predicates \\ nil) do
    quote do
      {
        :required,
        unquote(field),
        Validation.process_predicates(unquote(predicates))
      }
    end
  end

  defmacro process_predicates(nil) do
    nil
  end

  defmacro process_predicates({:and, _, [left, right]}) do
    quote do
      {
        :and,
        Validation.process_predicates(unquote(left)),
        Validation.process_predicates(unquote(right)),
      }
    end
  end

  defmacro process_predicates({name, _, nil}) do
    name
  end

  defmacro process_predicates({name, _, args}) do
    quote do
      {
        unquote(name),
        unquote(args)
      }
    end
  end
end

With this we can:

$ iex -S mix
Erlang/OTP 19 [erts-8.1] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

Compiling 1 file (.ex)
Interactive Elixir (1.3.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> import Validation                         
Validation
iex(2)> required(:email)                          
{:required, :email, nil}
iex(3)> required(:email, filled?)                 
{:required, :email, :filled?}
iex(4)> required(:email, filled? and match(~r/@/))
{:required, :email, {:and, :filled?, {:match, [~r/@/]}}}

solnic commented 7 years ago

Ah now we're talking :)

So, having that data structure that defines rules, do you know how we could translate it into a composed function?

lasseebert commented 7 years ago

I think it's just a matter of pattern matching the predicates and let :and handle the composing.

E.g. if our schema is %{email: {:required, {:and, :filled?, {:match?, [~r/@/]}}}}, we could do something like this:

  def result(params, schema) do
    errors = params
             |> Enum.reduce([], fn {key, value}, errors ->
               {:required, predicate} = Map.get(schema, key)
               case validate(predicate, key, value) do
                 :ok -> errors
                 {:error, message} -> [message | errors]
               end
             end)

    %{data: params, errors: errors}
  end

  def validate(nil, _key, _value) do
    # No predicates
    :ok
  end

  def validate({:and, left, right}, key, value) do
    with :ok <- validate(left, key, value) do
      validate(right, key, value)
    end
  end

  def validate(:filled?, key, nil) do
    {:error, "#{key} must be filled"}
  end
  def validate(:filled?, _key, _value) do
    :ok
  end

  def validate({:match?, [pattern]}, key, value) do
    case value =~ pattern do
      true -> :ok
      false -> {:error, "#{key} must match pattern #{pattern |> inspect}"}
    end
  end

Note that I throw away the :required in this example. I only validate the predicates.

iex(1)> schema = %{email: {:required, {:and, :filled?, {:match?, [~r/@/]}}}}
%{email: {:required, {:and, :filled?, {:match?, [~r/@/]}}}}

iex(2)> Validation.result(%{email: nil}, schema)                            
%{data: %{email: nil}, errors: ["email must be filled"]}

iex(3)> Validation.result(%{email: "foo"}, schema)                          
%{data: %{email: "foo"}, errors: ["email must match pattern ~r/@/"]}

iex(4)> Validation.result(%{email: "me@example.com"}, schema)               
%{data: %{email: "me@example.com"}, errors: []}

lasseebert commented 7 years ago

I will investigate how dry-validation is build and then make some proof-of-concept code ;)

solnic commented 7 years ago

That might take a while since the most complex part of dry-v is...the DSL itself; however, conceptually, the whole thing is very simple. First of all dry-v doesn't do much when it comes to validation logic itself, as this is provided by dry-logic gem. This gem uses its own AST format to describe operations and predicates. There are common logic operations like AND or XOR but there are also operations used for extracting data or applying predicates in more complex ways (ie a set operation, which applies a set of predicates to its input).

What dry-v is doing via its DSL, is generating AST, which is compiled to dry-logic operations with predicates, then it just applies these operations to input.

Here's a very simple :key operation, which extracts value from a hash under specified key, and applies its predicates:

irb(main):012:0> compiler = Dry::Logic::RuleCompiler.new(Dry::Logic::Predicates)
=> #<Dry::Logic::RuleCompiler:0x007fc8fe0dce78 @predicates=Dry::Logic::Predicates>
irb(main):013:0> ast = [:key, [:email, [:predicate, [:filled?, [[:input, Undefined]]]]]]
=> [:key, [:email, [:predicate, [:filled?, [[:input, Undefined]]]]]]
irb(main):014:0> rule = compiler.visit(ast)
=> #<Dry::Logic::Operations::Key rules=[#<Dry::Logic::Rule::Predicate predicate=#<Method: Module(Dry::Logic::Predicates::Methods)#filled?> options={:args=>[]}>] options={:name=>:email, :evaluator=>#<Dry::Logic::Evaluator::Key path=[:email]>, :path=>:email}>
irb(main):015:0> rule.(email: '')
=> #<Dry::Logic::Result:0x007fc8fc9a9cd8 @success=false, @id=:email, @serializer=#<Proc:0x007fc8fc9a9c60@/Users/solnic/Workspace/dry-rb/dry-logic/lib/dry/logic/operations/key.rb:44>>
irb(main):016:0> rule.(email: '').to_ast
=> [:failure, [:email, [:key, [:email, [:predicate, [:filled?, [[:input, ""]]]]]]]]

Now here's the really good part - in Elixir we don't need that whole dry-logic infrastructure, as the only reason why it exists is to provide the ability to compose callable objects (think functions) in a logical way. So ie filled & min_size?(18) in dry-v DSL is translated to dry-logic AST, then compiled into AND operation. In Elixir, we just need predicate functions and a way to compose them using logic operators, and we're done :)

Apart from validation logic, rules and operations, we need to deal with error messages - and that has been so far the most challenging part, but let's look into this later maybe, once we have predicate functions and composition done.

lasseebert commented 7 years ago

Thanks for the comprehensive description. It will surely help me trying to find a place to start :)

I agree that we should wait with error messages or at most have simple and non-configurable error messages until we have a more complete validation engine.

I also think coercion and nested schemas can wait a bit.

I'm aiming for something like this to start with:

schema = Validation.schema(
  [
    required(:name, filled and string),
    required(:email, filled and string and match(~r/@/))
  ]
)
# => Some AST wrapped in a %Schema{}. Perhaps a Map of ASTs.

# Later we can define shorthands for `filled and string and match()` like the
# macros in dry-v

params = %{"name" => "Me", "email" => "me@example.com", "foo" => "bar"}
result = params |> Validation.result(schema) # => A %Result{}
result.errors # => %{}
result.valid? # => true
result.data # => %{name: "Me", email: "me@example.com"}

params = %{"name" => "Me", "email" => "not an email"}
result = params |> Validation.result(schema)
result.errors # => %{email: ["is invalid"]}
result.valid? # => false
result.data # => %{name: "Me", email: "not an email"}

solnic commented 7 years ago

This looks like a great starting point! Are you OK with TDDing it? This project is an opportunity for me to learn Elixir better, and TDD is an important part of my workflow so I would love to see tests and implementation later, if that's OK with you :)

lasseebert commented 7 years ago

I'm not just ok with TDDing. I can't really code without it :)

TDD ALL THE THINGS (insert All The Things meme image here)

solnic commented 7 years ago

Fantastic! 🎉 Please feel free to go ahead and start hacking on it. I'll be chiming in and at some point I should be able to help with coding too (I need to refresh my elixir memory first :))

michalmuskala commented 7 years ago

I came here from Twitter. I was thinking about doing something similar for quite some time now.

I would have only one advice - don't start with macros. Start with plain functions and define data types that they are going to work on (how to encode various types and validations). Introduce macros as the last step to remove verbose declarations (if needed).

lasseebert commented 7 years ago

Thanks @michalmuskala. This is an excellent advise.

I will try to refactor the existing code to not use macros, or at least have a pleasant interface without macros.

certainty commented 7 years ago

I also came here via Twitter and would like to give my two cents if I may.

There seem to be two obvious solutions besides the proposed compilation to a predicate logic expression, that is then turned into a function.

I have the impression it could be simpler.

1. Functions all the way

What you typically do is model a structure that can represent the outcome of the validation and some data ( e.g. the error messages). You want to make sure that the functions producing this result are composable such that errors are accumulated and the overall outcome is :error if any of the validation failed.

In other languages you would use something like applicative functors or even monads for this.

The simplest solution is probably returning a tuple {:ok | :error, Map.t }. Then fold over the validation functions and combine the result according to some simple rule.

The validators could be higher order functions that accept the thing to validate and the map of errors seen so far.

2. Use datastructures and interpret them

Instead of using functions, have your validators return datastructures and then use an interpreter function that applies the semantics represented by the structures. That is basically a fold over the set of structures accumulating errors.

Then return :ok if the map of errors is empty.

Generally

I can only second that it is advisable not to start with macros. Macros are hard to write and, in contrast to functions, do not compose.

Also they live in the strange world of the compile time (actually macro expansion time) and any non trivial macro can be hard to comprehend. Let alone issues due to phasing.

Oh and of course there are possibly many more and better ways to do this. I am looking forward to your actual solution. Have fun and happy hacking 😊

lasseebert commented 7 years ago

Thank you very much @certainty, for taking the time to give "your two cents" :)

I really like the idea of composable functions rather than one validate function that applies a given set of predicates and builds the result.

On a side note: I think we need both error messages and the resulting params map in the result data structure, since validation rules work as a whitelist and possibly also coercer of the input.

If I read your suggestion correctly we have these parts of the engine:

Predicates: Simple functions that accept only a value and perhaps some arguments and return either true or false. E.g. filled?(value) or match?(value, pattern)
Composition of predicates, e.g. and, which is also a predicate.
Rules: Higher order functions generated with the thing to validate, e.g. param key and predicate. These takes as arguments the params map and result structure and returns an updated result structure.
Schema: Composition of rules.

There is a distinction of key validation and value validation. A key can be required or optional and values can be validated with predicates. These two concepts can probably also just be a composition of functions.

Again, thanks! :)

certainty commented 7 years ago

@lasseebert that seems to be sound.

I have implemented a little combinator based validation library that allows to validate elixir data.

That is akin to your suggestion to make primitive validation functions and combinators to build new validators out of existing ones.

With regards to key and value validation. Is there a distinction needed? I mean you could have a validator that asserts that the value for that key is there, which means it that is required and optional otherwise, right?

Maybe it is worthwhile to build a little spike that implements the basic API as you suggested. That would allow to evaluate the idea before you dive directly into the proper implementation. Also it could serve as a basis for a more concrete discussion :)

I am curious to see this library evolve.

Update: please forgive my ignorance. I only just noticed you already began to code. So I already have something to look at.

lasseebert commented 7 years ago

Yes @certainty, you are right that a rule can just validate either some key or some value. Or both or a combination of things. :)

I will start small and build the most basic features of the library with no respect for the public API, but with focus on composition of predicates and rules. Then after that make the public API nice to work with.

fazibear commented 7 years ago

Hi, I came here from Twitter. I was looking for a validation library while ago and found https://github.com/CargoSense/vex. This is something very similar to what you want to get.

schema = [username: [length: [min: 2]]

Vex.valid?(%{username: "fazibear"}, schema)

Vex also supports more fancy syntax.

lasseebert commented 7 years ago

Hi @fazibear. Yes I also looked at Vex while searching for a validation library. I skipped it because it seemed somewhat limited. AFAICS it has no support for nested validations and custom validations.

lasseebert commented 7 years ago

@fazibear I liked that Vex is simple and it looks like it is easy to reuse schemas or part of schemas. But as mentioned, it seemed to have too little functionality to suit my needs.

Thanks for sharing though ;)

andre1sk commented 7 years ago

Would be cool to be able to serialize the schema to reuse validations on the client side.

lasseebert commented 7 years ago

Hi @andre1sk. I'm sorry to say that serializing will probably be hard to do, since we just defined a schema to be a pure function. With some added metadata it might be possible, but I think it will not be build into the library to start with. Thanks for the input :)

solnic commented 7 years ago

Functions all the way vs Use datastructures and interpret them

What about data structures translated to composed functions? That's how it works in dry-validation

Would be cool to be able to serialize the schema to reuse validations on the client side.

We did that with dry-validation and formalist and very quickly realized that it's tricky because it's common to have validations that rely on backend too much, and it's not possible to translate them directly to client-side code. I'm still not sure about the value of this, it's relatively easy to support translation of simple value-based checks when you have an abstract representation of such checks (and dry-validation has that) but then you need to be able to drop validations that are not "portable", which is actually quite tricky.

michalmuskala commented 7 years ago

We did that with dry-validation and formalist and very quickly realized that it's tricky because it's common to have validations that rely on backend too much, and it's not possible to translate them directly to client-side code. I'm still not sure about the value of this, it's relatively easy to support translation of simple value-based checks when you have an abstract representation of such checks (and dry-validation has that) but then you need to be able to drop validations that are not "portable", which is actually quite tricky.

Ecto solves this by allowing each validation to store metadata in the changeset about itself. Only the validations that registered the metadata can be serialised, the rest is opaque and considered backend-only. By default phoenix uses this metadata to do HTML5 validation in the forms.

lasseebert commented 7 years ago

It's a good point @solnic

As I see it there are advantages of both approaches:

With a data structure we get access to metadata about a schema at runtime.

With a schema being just a function composed of other functions, we get simplicity and flexibility. It is straight forward to add a custom rule or a custom predicate, since they are just functions. The entire schema can even be swapped out with a custom function if needed.

Maybe find a way to have the cake and eat it too? A schema being a pure function but somehow be able to access or create metadata from it. Something like this, although I don't like this particular approach much:

str? = fn                                            
  :metadata -> {:meta, [name: :str?, type: :predicate]}
  value when is_binary(value) -> :ok                   
  _ -> {:error, "must be a string"}                    
end

Rules and schemas then just accumulate metadata from their composed functions.

lasseebert commented 7 years ago

After some thought, I like @solnic's idea of a schema as a nested data structure that compiles into a composed function.

I really liked the simplicity of "just functions", but I think it will get complicated quickly when adding features like i18n for error messages, auto-generated documentation or whatever else we or a third party library could think of.

andre1sk commented 7 years ago

@solnic I'm still not sure about the value of this, it's relatively easy to support translation of simple value-based checks when you have an abstract representation of such checks (and dry-validation has that) but then you need to be able to drop validations that are not "portable", which is actually quite tricky.

Since the idea is back to a schema wouldn't having some flag to not export a given rule be enough? Also optionally some syntax to delegate the check(s) to an API call?

lasseebert commented 7 years ago

@andre1sk: If we end up using a schema datastructure, then yes, it is probably possible to serialize the schema to be used in documentation, frontend validation, and whatever else one could think of. IMO the serialization should be general and not aimed for frontend validation.

andre1sk commented 7 years ago

@lasseebert IMO the serialization should be general and not aimed for frontend validation.

I think you are totally right, but it does seem there should be some way to add "metadata" that can be used to adapt the schema to particular use case e.g. to optionally provide some fe export specific flags or documentation related data

certainty commented 7 years ago

I see you had some good discussions. I second the advantages of having an abstract representation of the validation. Especially if you aim for documentation, serialization.

One could also think about generating e.g. a JSON schema, in cases the validation is used for JSON data.

Also it serves as a good basis to experiment with different implementations of the interpretation of that structure. Now the result of the interpretation would be a composed function. In a similar vein other approaches can be tried out, should that be needed.

lasseebert commented 7 years ago

I think I have a good understanding now on how to achieve the composed data structure that compiled into a (composed) function.

I have begun hacking on it and will commit some code soon that supports the datastructure of predicates. I'm aiming for a basic building block, which represent the most basic predicate (think filled?), then compose those into composed predicates with e.g. and and or.

lasseebert commented 7 years ago

I'll close this initial issue now. Any discussion about a specific aspect of validation deserves more attention in an issue of it's own :)

lasseebert / validation

Define usage #1

1. Functions all the way

2. Use datastructures and interpret them

Generally