An api for stateful properties and model checking

whatyouhide / stream_data

Data generation and property-based testing for Elixir. 🔮

https://hexdocs.pm/stream_data

886 stars 66 forks source link

An api for stateful properties and model checking #94

Open keathley opened 6 years ago

keathley commented 6 years ago

I'm opening this issue based on discussions in IRC to start working towards a proposal for stateful model checking powered by stream_data.

Existing solutions

I'll try to provide an overview of the current solutions. I'm going to use Proper since thats what I'm most familiar with. Proper and EQC have almost identical apis but EQC provides extra features. Proper is a good baseline though. I'm by no means an expert on proper so I may get some of these details wrong. I'm sure @fishcakez can correct me though ;).

Lets say that we wanted to test getting and setting keys in redis. To do that we can use a state machine test. This involves setting up a test and then implementing some callbacks. I've added documentation to each callback to try to provide context on how it works. If you want to run the test locally the full example is here: https://github.com/keathley/redix_model.

defmodule RedixModel.ReadWriteTest do
  use PropCheck.StateM
  use PropCheck
  use ExUnit.Case

  require Logger
  @moduletag capture_log: true

  # This is the actual test. There is a bit of ceremony to ensure that everything is set up correctly.
  # We need to ensure that our redis lib (redix in this case) is started, sanitize the database, 
  # and then run our generated commands.
  property "Can get and set keys" do
    numtests(300, forall cmds in commands(__MODULE__) do
      trap_exit do
        Application.ensure_all_started(:redix)
        flush_database()

        # Actually run all the commands that we've generated
        {history, state, result} = run_commands(__MODULE__, cmds)

        Application.stop(:redix)

        (result == :ok)
        |> when_fail(
            IO.puts """
            History: #{inspect history, pretty: true}
            State: #{inspect state, pretty: true}
            Result: #{inspect result, pretty: true}
            """)
        |> aggregate(command_names cmds)
      end
    end)
  end

  def flush_database do
    {:ok, conn} = Redix.start_link()
    Redix.command(conn, ["FLUSHDB"])
    Redix.stop(conn)
  end

  def start_redis do
    Redix.start_link()
  end

  #########################################################################
  # Everything below this is the actual "model" code.
  #########################################################################

  # This is the state of our model
  defstruct contents: %{}, conn: :empty

  @doc """
  Sets the initial state of the model per test run
  """
  def initial_state, do: %__MODULE__{}

  @doc """
  Statefully generate valid command sequences. Each "call" here is a symbolic call. It will be executed
  after the sequence is generated.
  """
  def command(%{conn: :empty}) do
    # The very first thing we need to do is establish a redis connection so
    # we call the module that we're in to do that work.
    {:call, __MODULE__, :start_redis, []}
  end
  def command(%{conn: conn, contents: contents}) do
    keys = Map.keys(contents)

    oneof([
      {:call, Redix, :command, [conn, ["GET", oneof(keys)]]},
      {:call, Redix, :command, [conn, ["SET", binary(), binary()]]},
      {:call, Redix, :command, [conn, ["SET", oneof(keys), binary()]]},
    ])
  end

  @doc """
  Determines if the call is valid in our current state. This is necessary because
  during shrinking commands will be removed from the generated list. We need to
  ensure that those commands are still valid based on the state that we're in.
  """
  def precondition(%{conn: :empty}, {:call, Redix, _, _}), do: false
  def precondition(_, _), do: true

  @doc """
  Moves to the next state. The value is the result of the call but it may be a
  symbolic reference to the result depending on where we are in the process.
  """
  def next_state(state, _value, {:call, _, :command, [_, ["SET", key, value]]}) do
    put_in(state, [:contents, key], value)
  end

  def next_state(state, _value, {:call, _, :command, [_, ["GET", _]]}) do
    state
  end

  # This is a trick to get around symbolic variables. When we're generating commands we don't know
  # what the result of our call is going to be yet. So the return value is "symbolic". But in our case we 
  # need the real value so we pattern match here. If we wanted to be complete we should also match
  # on an error clause here as well. There are other ways to solve this problem
  # but I'm using this method to illustrate some of the pain of dealing with
  # symbolic variables generally.
  def next_state(state, {:ok, conn}, {:call, _, :start_redis, _}) do
    %{state | conn: conn}
  end
  def next_state(state, _, {:call, _, :start_redis, _}) do
    state
  end

  @doc """
  Checks the model state after the call is made. The state here is the state
  BEFORE a call was made. The call is the symbolic call that we executed. The
  result is the real result from the call.
  """
  def postcondition(%{contents: contents}, {:call, Redix, :command, [_, cmd]}, result) do
    case cmd do
      ["GET", key] ->
        Map.fetch(contents, key) == result

      ["SET", _key, _value] ->
        result == {:ok, "OK"}
    end
  end

  def postcondition(_, _, _) do
    true
  end
end

The way that proper works is by first generating a list of symbolic calls based on the state space specified by the command and next_state callbacks. After proper has generated the symbolic calls it executes each command and checks postconditions. If it finds an error it shrinks the generated commands list. The most effective means of doing this is by removing commands. This is one of the reasons we need our preconditions. Its likely that by removing certain commands we'll end up with a command that is invalid based on our current state. We use preconditions to avoid that.

Separating the generation and execution of commands means that shrinking is easier to control and somewhat more deterministic from what I understand.

A new api?

I've only done model checking using the "EQC" api so I'm not sure that I have a novel way of approaching the problem from an api standpoint. ScalaCheck uses similar semantics but each command is a class and provides its own preconditions, postconditions, and next state functions. That division seems like it would help keep things tidy as a model grows. I think it would be ideal to use simple functions but its been my experience that models can get quite large so it would be nice to have a more prescribed way to manage them.

I hope that this helps provide some context. I'm very excited to see what y'all think because its been my experience that model checking is where property testing really shines.

josevalim commented 6 years ago

Thanks for the write up @keathley!

I am not that familiar with stateful checking so please take my feedback with a grain of salt. The examples above seem to have a lot of repetition and I don't like that. The ScalaCheck approach probably helps remove this repetition but imposing a module per command would not be very idiomatic. As I hinted on IRC, I would prefer each approach to rely on data structures and function composition.

Here is how I would rewrite the model bit with this in mind:

  def command(%{conn: :empty}) do
    # A simple command with the atom ok as constant generator.
    :ok
    |> command(fn :ok -> start_redis() end)
    |> next_state()
  end

  def command(%{conn: conn, contents: contents}) do
    keys = Map.keys(contents)

    # preconditions, next state and post conditions are attached to
    # commands only if necessary.
    get =
      one_of(keys)
      |> command(fn key -> Redix.command(conn, ["GET", key]) end)
      |> precondition(&continue_if_not_empty_conn/2)
      |> post_condition(fn state, key, result -> Map.fetch(state.contents, key) == result end)

    set =
      {binary(), binary()}
      |> command(fn {key, value} -> Redix.command(conn, ["SET", key, value]) end)
      |> next_state(&store_key/3)
      |> post_condition(fn _state, _key, result -> result == {:ok, "OK"} end)

    set_existing =
      {one_of(keys), binary()}
      |> command(fn {key, value} -> Redix.command(conn, ["SET", key, value]) end)
      |> next_state(&store_key/3)
      |> post_condition(fn _state, _key, result -> result == {:ok, "OK"} end)

    one_of([get, set, set_existing])
  end

  defp continue_if_not_empty_conn(%{conn: conn}, _key), do: conn != :empty

  defp store_key(state, {key, value}, _result), do: put_in(state.contents[key], value)

This is not a complete rewrite but it should get the discussion moving. Because it is just function and data, we can pass around commands and build them in many different ways.

Running the commands is a matter of passing the initial state and a the reference to function that generates the commands, in this case, command/1.

Some remaining topics are discussed below.

preconditions vs state

One of the questions I have is why do we pre_conditions if the commands we generate are already tailored to the given state? For example, would we ever reach the precondition below, given we don't generate Redix commands when the conn is empty?

  def precondition(%{conn: :empty}, {:call, Redix, _, _}), do: false

What about symbolic variables?

Given we put the execution of the code inside closures, there is a chance we no longer need symbolic variables (again, grain of salt here, I likely don't know what I am talking about). But in a nutshell, if a command needs to store or read something previously stored, we could simply pass the state to the comamnd function:

one_of(keys)
|> command(fn state, key ->
  result = Redix.command(conn, ["GET", key]) end)
  {result, put_in(state.temp1, result)}
end)

I am not sure if storing symbolic variables in the state is a good idea because they may have different livecycles. If storing it in the state is a bad idea, we could introduce something such as a session, that would have something closer to the semantic variables lifecycle. Somebody with more understanding has to chime in here with the semantics and the operations that need access to semantic variables.

Debugging

One of the nice things about symbolic calls is that it is easy to generate a trace in case the test fails. For example:

[{set,{var,1},{call,proc_reg_eqc,spawn,[]}},
{set,{var,2},{call,proc_reg,where,[c]}},
{set,{var,3},{call,proc_reg_eqc,spawn,[]}},
{set,{var,4},{call,proc_reg_eqc,kill,[{var,1}]}},
{set,{var,5},{call,proc_reg,where,[d]}},
{set,{var,6},{call,proc_reg_eqc,reg,[a,{var,1}]}},
{set,{var,7},{call,proc_reg_eqc,spawn,[]}}]

In the proposal above, all of the operations are hidden behind anonymous functions, which means we don't get this understanding. We don't even have a name we can refer to the commands.

One possible solution is to require each command to have a name. Something like:

  :get
  |> command(one_of(keys), fn key -> Redix.command(conn, ["GET", key]) end)
  |> ...

Now, in case of failures, we can say the command name and generated args:

[
  {:get, "foo"},
  {:set, {"bar", "baz}}
]

Another idea is to make command a macro and rely on do blocks:

  :get
  |> command(one_of(keys)) do
         key -> Redix.command(conn, ["GET", key]) end)
       end
  |> ...

This would allow us to show the code that reproduces the failure - up to some extent, things like conn would still need to be extracted from the state, at least in the example above. The downside is that we now have macros and you can no longer pass functions references to command.

This is what I have so far. Feedback is welcome!

keathley commented 6 years ago

Preconditions vs State.

Pre-conditions are needed because during shrinking commands will be removed from the list of commands that caused a failure. Removing these commands means that we may create a command sequence that is "invalid". Pre-conditions ensure that we don't attempt to run "invalid" commands during the shrinking process.

Symbolic calls

I've been reading through the QuickCheck papers, "QuickCheck Testing for fun and profit" and "Testing Telecoms Software with Quviq QuickCheck" in order to better understand the benefits of using symbolic calls. Here's a quote from "QuickCheck Testing for fun and profit" that I think sums it up nicely:

We made an early decision to represent test cases symbolically, by an Erlang term, rather than by, for example, a function which performs the test when called. Thus if a test case should call unregister(a), then this is represented by the Erlang term {call,erlang,unregister,[a]}—a 4-tuple containing the atom call, the module name and function to call3, and a list of arguments. The reason we chose a symbolic representation is that this makes it easy to print out test cases, store them in files for later use, analyze them to collect statistics or test properties, or—and this is important—write functions to shrink them.

I think it would be ideal to use symbolic calls under the hood without having to expose them to the end user. I like the idea of co-locating all of the conditions for a call around a data structure like in your example @josevalim. IMO having to pass a generator into the command is awkward but I guess there's no way around that since we're using streams under the hood. Ideally we could just do something like this:

def get(key) do
  :get
  |> command()
  |> run(fn state -> Redix.command(state.conn, ["GET", key]) end)
  |> precondition(fn state -> state.conn != :empty end)
  |> next_state(fn state -> state end)
  |> postcondition(fn state, result -> result == Map.fetch(state.contents, key) end)
end

def set(key, value) do
  :set
  |> command()
  |> run(fn state -> Redix.command(state.conn, ["SET", key, value]) end)
  |> precondition(fn state -> state.conn != :empty end)
  |> next_state(fn state -> put_in(state, [:contents, key], value) end)
  |> postcondition(fn _state, result -> result == {:ok, "OK"} end)
end

I don't have a strong preference between using something like command vs. run to distinguish the actual command code; it just read better to me. This API would allow us to generate the list of calls (as command structs or something similar) and retain all of the benefit of the symbolic language without forcing users to have to deal with it.

josevalim commented 6 years ago

@keathley re: preconditions vs state, thanks for the clarification.

Here's a quote from "QuickCheck Testing for fun and profit" that I think sums it up nicely

Great job digging it up.

IMO having to pass a generator into the command is awkward.

Interesting. I actually like it a lot because I find the symbolic call API very confusing. It is a mixture of special tuples (such as :call), arguments that may be generators or not and straight-up values. I think having boundaries around those is nice.

If I understand your proposal correct, you moved the whole definition of the command after the generator is called. If that's the case, I would move the command call out, something like:

  def command(%{conn: :empty}) do
    command(__MODULE__, :started_redis, [])
  end

  def command(%{conn: conn, contents: contents}) do
    keys = Map.keys(contents)

    # The args in command is always a list of generators
    one_of([
      command(__MODULE__, :get, [one_of(keys)]),
      command(__MODULE__, :set, [binary(), binary()]),
      command(__MODULE__, :set, [one_of(keys), binary()])
    ])
  end

  def get(key) do
    fn state -> Redix.command(state.conn, ["GET", key]) end
    |> precondition(fn state -> state.conn != :empty end)
    |> postcondition(fn state, result -> result == Map.fetch(state.contents, key) end)
  end

  def set(key, value) do
    fn state -> Redix.command(state.conn, ["SET", key, value]) end
    |> precondition(fn state -> state.conn != :empty end)
    |> next_state(fn state -> put_in(state, [:contents, key], value) end)
    |> postcondition(fn _state, result -> result == {:ok, "OK"} end)
  end

Does this make sense?

A command can be a simple anonymous function or something that we build after piping through precondition/next_state/post_condition.

I am still not sure if something needs to be done about symbolic variables.

whatyouhide commented 6 years ago

I like the idea of having command/3 that takes a module, a function, and a list of generators that will be used to generate a list of arguments.

However I am not sure about having commands be a fun that is enriched with precondition, next state, postcondition. Do we always need all three? If we do, maybe we should think about a way of ensuring that and making it easier to do and understand.

I also wanted to mention that I am still not sure about this approach of having precondition and postcondition for each command is not more repetitive than having precondition/postcondition hooks like the original proposal. I guess the original proposal resembles a fsm somehow, where you have a function that decides with what precision to handle commands and states. Example from the initial proposal: we might not want to ever issue any command except "connect" against a disconnected Redis instance. I can't form a preference between the two approaches yet though.

josevalim commented 6 years ago

I also wanted to mention that I am still not sure about this approach of having precondition and postcondition for each command is not more repetitive than having precondition/postcondition hooks like the original proposal.

Just to clarify: precondition/postcondition should only be declared if necessary.

One question worth asking is if the state in precondition/next_state/postcondition is always the same. If so, we can encode those things in a single function and use return tuples:

  def command(%{conn: :empty}) do
    command(__MODULE__, :started_redis, [])
  end

  def command(%{conn: conn, contents: contents}) do
    keys = Map.keys(contents)

    # The args in command is always a list of generators
    one_of([
      command(__MODULE__, :get, [one_of(keys)]),
      command(__MODULE__, :set, [binary(), binary()]),
      command(__MODULE__, :set, [one_of(keys), binary()])
    ])
  end

  def get(%{conn: :empty}, _key), do: :precondition_failed
  def get(state, key) do
    result = Redix.command(state.conn, ["GET", key])
    assert Map.fetch(state.contents, key) == result
    {:ok, state}
  end

  def set(%{conn: :empty}, _key, _value), do: :precondition_failed
  def set(state, key, value) do
    assert {:ok, "OK"} = Redix.command(state.conn, ["SET", key, value])
    {:ok, put_in(state.contents[key], value)}
  end

alfert commented 6 years ago

What I like in @josevalim's last example is the combination of executing the command, checking the postcondition via assert and returning a probably modified state. We do not need to return a tuple, since there is no reason to return something instead of :ok: Either the postcondition fails, then assertions blows or everything is ok by definition. This makes it really short and to the point.

But this does not work. During the symbolic execution (ie. command sequence generation), the next_state function decides how the state changes after a working command call. Therefore we must distinguish between real command execution with the System Under Test and the modification of the current test status, which is handled in next_state. These two separate phases of command generation and command execution are shown nicely here http://propertesting.com/book_stateful_properties.html#_execution_model.

In PropEr/PropCheck all these callbacks are general functions wich distinguish via the arguments which command is currently under consideration. This is the same pattern in GenServer. EQC nowadays uses a different approach, which groups these callbacks by command, which is an atom (see http://quviq.com/documentation/eqc/eqc_statem.html). In the spirit of a modern EQC state machine we could formulate it this way:

  def get_args(_state), do: fixed_list([constant(state.conn), keys()])
  def get_pre(%{conn: :empty}), do: false
  def get_pre(_state), do: true
  def get_next(state, _args, _result), do: state
  def get_post(state, [_conn, key], result) do
    assert Map.fetch(state.contents, key) == result
  end
  def get(conn, key) do
    Redix.command(conn, ["GET", key])
  end

  # generate the arguments for the command
  def set_args(state), do: fixed_list([constant(state.conn), keys(), values()])
  def set_pre(%{conn: :empty}), do: false
  def set_pre(_state), do: true
  def set_post(_state, _args, result), do: assert result == {:ok, "OK"}
  def set_next(state, [_conn, key, val], _result) do
    # post condition is true, result may be symbolic
    put_in(state.contents[key], value)
  end
  def set(conn, key, val) do
    Redix.command(conn, ["SET", key, val])
  end

What I like in this approach is that the code for one command stays together. This makes it easier to understand the model piece by piece. Using only regular functions reduces the magic for test code writers. But it feels a little bit off from an Elixirish point of view. How about applying nested definitions as in ExUnit's describe:

  specify :get do
    def args(_state), do: fixed_list([constant(state.conn), keys()])
    def pre_condition(%{conn: :empty}), do: false
    def pre_condition(_state), do: true
    def next_state(state, _args, _result), do: state
    def post_condition(state, [_conn, key], result) do
      assert Map.fetch(state.contents, key) == result
    end
    def execute(conn, key) do
      Redix.command(conn, ["GET", key])
    end
  end

The specify macro would simply add the command name (here get) as a prefix to internally defined functions.

This way, we can add more optional features in to the command specification later on. Weights for commands are one example, so commands are not generated by a one_of but by frequency to execute some commands more often than others. In the EQC documentation are some more optional callbacks such as features, shapes, return, adapt, ... which might be interesting when we learn more about stateful testing with StreamData.

josevalim commented 6 years ago

Thanks @alfert for the input!

We do not need to return a tuple, since there is no reason to return something instead of :ok

I returned a tuple because the command may also return :precondition_failed.

But this does not work. During the symbolic execution (ie. command sequence generation), the next_state function decides how the state changes after a working command call.

Apologies but I don't see how this is not possible with the proposal so far. In the proposal, the goal was to embed the working command call and next state in a single step. The state is updated only if the command succeeds. Given the graph you referenced, the only thing that is not possible to distinguish is an error in the command itself vs an invalid postcondition. However, it is unclear if any of the libraries use this information in any meaningful way. If they do, we could also allow the command to return :postcondition_failed.

The specify macro would simply add the command name (here get) as a prefix to internally defined functions.

On IRC, we have mentioned that the best approach would be to avoid DSLs and rely mostly on data and functions/callbacks. :)

In any case, it is interesting to see that EQC also went to the direction where the knowledge is grouped by the command (get/set) and not by the state machine (pre/post/next_state).

alfert commented 6 years ago

Apologies but I don't see how this is not possible with the proposal so far. In the proposal, the goal was to embed the working command call and next state in a single step. The state is updated only if the command succeeds. Given the graph you referenced, the only thing that is not possible to distinguish is an error in the command itself vs an invalid postcondition.

The problem I see is that the next_state function is called twice: First during symbolic execution while generating the commands and secondly while executing the commands. In both cases, next_state decides how the model evolves. If you glue test execution with model update, then you have either to implement the model update twice or you execute the system under test during call generation which does not work by design.

But you are right that my comment is a little too harsh. Melting command execution and checking the post condition together via assert is indeed a perfect match and removes the need for an explicit post_condition function. The existence of post_condition is apparently an heritage of the boolean property approach, which is not used in StreamData any longer due to its nicer assert macros. But I would suggest to move the state update into its own function (i.e. next_state) to avoid to implement it twice.

Just to give an example where this matters: In an extension of the Redis example we could implement a flush command, which drops the entire database. This means that model state must also be empty thereafter. So any call to get must fail until we start again with calling set to fill the database. This must be recorded in the generation phase and executed and experienced later during test execution. Granted that in the current Redis example this is not reflected, as nobody cares about the values stored in Redis and the model unless they are the same. Closing the connection would be another example where the state is crucial for command generation. To summarise, preconditions (and in EQC and PropCheck also the weights) depend on the state and thus the command generation depends also on the state, so the state should be changeable during the generation phase and not only during the execution phase.

On IRC, we have mentioned that the best approach would be to avoid DSLs and rely mostly on data and functions/callbacks. :)

Good to hear that! Seems that I should to be at least a lurker at IRC (where I never was before...) to be up to date :)

josevalim commented 6 years ago

Just to give an example where this matters: In an extension of the Redis example we could implement a flush command, which drops the entire database. This means that model state must also be empty thereafter. So any call to get must fail until we start again with calling set to fill the database

From my understanding - and please keep in mind that I understand very little about stateful models - this implies we first generate the commands and apply them only later. Couldn't we apply the commands as we generate them instead? Or would we miss some important feature? Maybe doing so would complicate space exploration or shrinking?

In any case, I think this shows there is a dichotomy between the system under test and the model that is not well-defined in the latest proposals.

josevalim commented 6 years ago

Good to hear that! Seems that I should to be at least a lurker at IRC (where I never was before...) to be up to date :)

Btw, I don't think IRC is required. It is probably a good idea to document here the discussions done there anyway.

alfert commented 6 years ago

From my understanding - and please keep in mind that I understand very little about stateful models - this implies we first generate the commands and apply them only later. Couldn't we apply the commands as we generate them instead? Or would we miss some important feature? Maybe doing so would complicate space exploration or shrinking?

The reason for splitting the generation and execution phase is enabling shrinking and understanding what is going on. The idea is that we generate a list of calls of the system under test. This is one test case for the property. After the generation, we execute this list. If an error happens, then this particular list is shrunk towards a smaller list. Generation phase, shrinking and execution phase all depend on the model. Then the next list of calls is generated and executed and so on. If an error happens, we can see the history of (symbolic) calls to the system under test and how in parallel the model evolves. This helps to understand the error. To get a nice reporting with StreamData is open issue, in particular if the command list is rather long.

The very basic idea of the entire approach is that the model reveals relevant features of the system under test. You can have several models for the same system for analysing different aspects, e.g. making the model gradually more complex. The idea behind it is the same as a bisimulation. If the system under test behaves as the model predicts, then the system under test works fine.

josevalim commented 6 years ago

Thank you, @alfert! Do you think checking for preconditions is a property of the model or of the SUT?

whatyouhide commented 6 years ago

Isn't only the postcondition check a property of the SUT? The point of keeping the model is to not have to check anything from the system except for postconditions afaiu?

josevalim commented 6 years ago

Apologies but I am getting a bit confused here.

In both the Proper/EQC example, the next_state hook receives the value of the execution, so doesn't it mean we need to execute in order to retrieve the next_state?

alfert commented 6 years ago

@josevalim Do you think checking for preconditions is a property of the model or of the SUT?

Checking the precondition is required during the generation phase to generate useful calls. Think about a system that allows to delete items. The precondition would check that such an item does exist within the system. I.e. the model must represent that the system has stored this item and therefore the generated call is legal/sensible/useful. It should trivially hold during execution time and crucial for shrinking because it may happen that shrinking modifies the system and the model state in way that the formerly legal call is no longer legal.

@whatyouhide Isn't only the postcondition check a property of the SUT?

Yes, exactly. The postcondition checks wether the system is in an proper state after executing the call. I.e. the return value of the call must be expected and derivable from the model. Here we can use assert instead of boolean return values.

@josevalim In both the Proper/EQC example, the next_state hook receives the value of the execution, so doesn't it mean we need to execute in order to retrieve the next_state?

Now I am confused. What do you mean by execute in order? Perhaps this helps: Since the state is not stored during generation (this would not be useful, because results of symbolic calls could be part of the state but they do not have a meaning), the state must be replayed during execution time and even more important during shrinking since shrinking modifies the initially generated list of commands.

josevalim commented 6 years ago

Now I am confused. What do you mean by execute in order?

What I meant to ask is this: because the next_state function receives the execution result, it means the initial generation of commands needs to happen side by side with the execution, because otherwise we don't have the result to compute the next_state. I guess for shrinking we could use the results from execution but it is also expected that once we change the order or remove commands during shrinking, the results will change, and by then the post condition may be necessary to check the results are still valid.

I plan to take a look at the state machine implementation in proper to help understand those questions.

alfert commented 6 years ago

Ah, I see your point. It is different: during generation you simulate the calls to the system, but you use next_state to record the expected state. In next_state you cannot evaluate the result of the symbolic calls. You can only rely on the arguments, the current model state and the expected result from the call (executed later in the future). That is the reason why only the post condition evaluates the result which is only called in the execution phase.

For Proper, there is also the tutorial with the movie server explaining how to specify a model with a complex state (i.e. including deletion): http://proper.softlab.ntua.gr/Tutorials/PropEr_testing_of_generic_servers.html This might help to understand the approach.

Von unterwegs gesendet

Am 06.04.2018 um 08:47 schrieb José Valim notifications@github.com:

Now I am confused. What do you mean by execute in order?

What I meant to ask is this: because the next_state function receives the execution result, it means the initial generation of commands needs to happen side by side with the execution, because otherwise we don't have the result to compute the next_state. I guess for shrinking we could use the results from execution but it is also expected that once we change the order or remove commands during shrinking, the results will change, and by then the post condition may be necessary to check the results are still valid.

I plan to take a look at the state machine implementation in proper to help understand those questions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

keathley commented 6 years ago

I want to address some of the concerns / points about repetition in the model code. In my experience models involve a non-trivial amount of code. As a reference point a basic model for testing our raft implementation is ~300 loc. Based only on experience and intuition, grouping preconditions, postconditions, and next state generations with each "command" won't help reduce repetition. You almost always need a precondition for a command. The same is true for postconditions. Defining a single callback like:

def precondition(%{conn: :empty}, {:call, Redix, _, _}), do: false

Is going to be less lines of code then needing to specify the same precondition for each command like:

def connected?(%{conn: :empty}), do: false
def connected?(_), do: true

:get
|> command
|> precondition(&connected?/1)

:set
|> command
|> precondition(&connected?/1)

The benefit to grouping isn't that there's less lines of code. Its that the meaningful pieces of code are located together. The co-location would potentially make the model and all of the preconditions and postconditions easier to reason about once the model has grown to a reasonable size.

ferd commented 6 years ago

because the next_state function receives the execution result, it means the initial generation of commands needs to happen side by side with the execution, because otherwise we don't have the result to compute the next_state. I guess for shrinking we could use the results from execution but it is also expected that once we change the order or remove commands during shrinking, the results will change, and by then the post condition may be necessary to check the results are still valid.

During the generation phase, PropEr and EQC only pass in a symbolic value to next_state, usually looking like {var, N} but could also be the result of calling make_ref(). What you get is an opaque placeholder for what the system would have produced at run-time. If your model makes use of that value, it must make so as an opaque term.

This means, for example, that if you call gen_server:start_link(...), you get an opaque term T and you cannot extract the pid from a {ok, Pid} tuple result. You just can't. Most people then when they call the remaining functions for the model can therefore not pass in T to a {call, my_server, some_call, [T]} because mostly none of them accept {ok, Pid} as a value, just Pid.

The common pattern is to create a shim module that will wrap common functions, i.e.

start_link(Args) ->
    {ok, Pid} = my_server:start_link(Args),
    Pid.

Then the model can use the opaque value T as if it were an actual Pid returned from the system (as long as you don't do things like is_pid(T) since that will fail.

My personal opinion from my current understanding of stateful property testing is that if you are generating command sequences with symbolic call (so they can be recorded to disk, manipulated as data, and so on), you end up requiring some user-provided glue to make the symbolic execution work with the real one, and pretending that only one execution takes place is not going to work.

If you decide to generate things differently so that generation and execution happen at the same time, that can possibly work, but shrinking may be more difficult because modifications to the existing chain will not be deterministic.

Take for example a sequence where I generate a command based on a random value provided by the server under test. If the value is in the interval [0.0 .. 0.5[, I generate a command to upgrade to a paid account. I the value is in the interval [0.5..1.0], I generate a command to delete my account.

Under PropEr and quickcheck, this cannot be done since the symbolic execution would not let me use the server-returned value to influence command generation. Under a mechanism where server-returned values can be used to influence command generation, then it's free game and there's no way to know what I'll get back.

You have to be kind of careful about how you approach determinism there.

alfert commented 6 years ago

I just finished a first version of an EQC like grouping with lots of debug output in Repo https://github.com/alfert/counter:

stateful property testing implementation https://github.com/alfert/counter/blob/master/lib/statem.ex
the system under test https://github.com/alfert/counter/blob/master/lib/counter/cache.ex
some test cases for statem.ex https://github.com/alfert/counter/blob/master/test/statem_test.exs
and finally the stateful test https://github.com/alfert/counter/blob/master/test/cache_eqc_test.exs

Some remarks on the current state of the implementation:

weights are not yet implemented, but should be easy to do. I would go for weights as global optional callback as in EQC such that al weight definitions stay together and to have the entire frequency distribution in one place.
pre-conditions are implemented as boolean predicates, since they are called several times and used in if statements.
post-conditions require the current state, therefore they are not merged with execution function. If the implementation functions also gets the state, this should be easy.
the elixir formatter is not in use
a more complete implementation should define a behaviour with (optional) callbacks, a using macro, default implementations of _pre and _post as true and of _next as identity, a better extraction of existing functions from .exs files (I don't like the reloaded module log output), more type specs and more documentation.

I am interested in your feedback. Have a nice weekend!

keathley commented 6 years ago

Since its been a bit I thought I would provide an update from my side. Currently we're just using proper for all of our stateful testing. Its working well for us so I'm not sure that we need this functionality to exist in stream_data. I'm happy to weigh in with opinions if that is helpful but otherwise I'm not going to be pursuing this further. Thanks for all the input and effort :heart:.

hauleth commented 5 years ago

@alfert I have exported your implementation as separate library https://github.com/hauleth/stream_state. I hope that you do not mind. I also have done some improvements:

no need for runtime recompilation, instead I use module.module_info/1 function
weights are implemented
formatted code
fallback implementations for *_pre and *_post
better failure messages:

Postcondition failed.

Command: StreamStateTest.inc() State: :one Result: 6

History: StreamStateTest.clear() => :ok StreamStateTest.inc() => 1 StreamStateTest.inc() => 2 StreamStateTest.inc() => 3 StreamStateTest.inc() => 4 StreamStateTest.inc() => 5

TBD:

~default fallback for *_next~ already done
cleanup rest of the code (probably split into more files)
support for metadata
implement FSM version along to StateM