nick8325 / quickcheck

Automatic testing of Haskell programs.
Other
726 stars 121 forks source link

Request for tutorial: How to test complicated stateful systems #139

Open saurabhnanda opened 7 years ago

saurabhnanda commented 7 years ago

Discussion at Reddit: https://www.reddit.com/r/haskell/comments/5jatdg/how_to_use_quickcheck_for_stateful_testing/

I've been looking for examples of how to use Quick Check for "regular" stateful testing, where one would essentially do the following - generate a known input case, call a series of domain APIs, and finally assert the expected system state.

For example, how would one use Quick Check for testing a CRUD JSON API? Or, say a reservation system's domain API, which allows overbooking only if the admin is making the overbooking request?

Even after reading John Hughes' paper on using Quickcheck to test a circular buffer and testing at Volvo & Karla and viewing the accompanying talk I have the following confusion:

saurabhnanda commented 7 years ago

Is the Haskell QuickCheck library (this library) at feature parity with Quviq's QuickCheck? The free version or the paid version? The free version does not have state machine based testing. Is this the reason why I'm unable to sync John Hughes' talk with the Haskell QuickCheck library - http://www.quviq.com/downloads/

nick8325 commented 7 years ago

Yes, unfortunately the state machine-based testing is not implemented in Haskell QuickCheck. It would be really nice to have it but no-one has implemented it. Quviq QuickCheck does have more features than Haskell QuickCheck, because of having full-time developers - Haskell QuickCheck is about equivalent to the free version, I would say.

The only special support for testing imperative code in Haskell QuickCheck is the Test.QuickCheck.Monadic module, based on the paper Testing monadic code with QuickCheck. This is a bit more work as you have to think of properties, while with the state machine approach you get the property "for free" after writing the model.

Since you ask, the model in the state machine testing is usually much simpler than the real system because it's implemented in the simplest possible way. For example, for the circular buffer, the model would simply be a list. For dets, I think it was just a list of key-value pairs.

nick8325 commented 7 years ago

Having said that, your comment did inspire me to start hacking together a similar library for Haskell QuickCheck. So far, it seems quite plausible (I was worried that the types would get in the way, but they don't really), so maybe this is something we could have in the future.

saurabhnanda commented 7 years ago

Having said that, your comment did inspire me to start hacking together a similar library for Haskell QuickCheck. So far, it seems quite plausible (I was worried that the types would get in the way, but they don't really), so maybe this is something we could have in the future.

Happy to help. Sort of :)

For example, for the circular buffer, the model would simply be a list

How can a simple list be used as a circular buffer without explicitly dealing with the wraparound logic, at which point it become an independent implementation itself?

saurabhnanda commented 7 years ago

This issue is also to seek guidance on how to use QC for real-life stateful apps. It seems the documentation is non-existent around this topic. For example, I can't visualise how to write tests for soemthing like a hotel reservation system.

saurabhnanda commented 7 years ago

If we take the non state-machine approach, what would some properties look like in the case of a hotel reservation system?

nick8325 commented 7 years ago

How can a simple list be used as a circular buffer without explicitly dealing with the wraparound logic, at which point it become an independent implementation itself?

In this case the circular buffer was used as a bounded queue, so you're not allowed to overfill it. That means the client can never observe the wraparound, it's purely an implementation detail.

The precondition, postcondition and next-state function for each operation would then look something like this, where the state is given as a list xs:

get:

put(x)

nick8325 commented 7 years ago

Configure hotel with random number of rooms; apply random set of reservation, cancellation, and modification operations; check that hotel is not oversold on any date?

Sounds reasonable.

How to bias the operations so that the dates are clustered together? How to bias towards month and year boundaries to uncover date related bugs?

This is something you would fix in your Arbitrary instances. For example, you could generate a random month, then choose the date by starting from the first of the month and adding a random offset in days between -30 and 30. By picking the offset in a biased fashion you can generate dates near the first of the month:

genOffset :: Gen Int
genOffset =
  frequency
    [(3, choose (-3, 3)),
     (1, choose (-30, 30))]

Configure hotel with random number of rooms; apply set of domain operations; ensure each valid operation is accepted. How to check for validity without reimplementing the validation logic?

This is a risk of course. You try to avoid it by making the model as simple as possible - no optimisations, no shortcuts. For example, in the hotel you might just define one function which takes the state of the booking system and checks if it's valid (e.g. no overbooking). Then you might say, if the hotel state is valid after making the booking then the booking should be accepted. I don't know the details of the booking system of course, but the point is you can often write the specification in a way which is more declarative than the implementation.

nick8325 commented 7 years ago

I agree that we need more documentation around this.

saurabhnanda commented 7 years ago

Do you know of anyone using QuickCheck for common business/domain oriented webapps (as opposed to complicated libraries like streams or parsers or compilers)? Probably motivating them to give a detailed talk or sharing some code can be a good start.

changlinli commented 7 years ago

@saurabhnanda My experience at work is with Scala with ScalaCheck, Scala's equivalent of QuickCheck, (which does have a version of state machine testing that I've basically never used). Unfortunately none of that is shareable and I don't have a great example to show from my own projects, although I'll see if I can whip one up in the coming days.

I do, however, have a bit of experience with how ScalaCheck get used in run-of-the-mill business webapps, which sounds like it might be useful to you. In general, my takeaway has been even if checking the conceptual heart of your application can seem difficult, there's usually enough other incidental properties that are crucial to the correctness of your program to check that QuickCheck (or the equivalent) is something I include in all my work projects these days. Here's a couple of use cases that almost always come up that are usually independent of state checking and can be applied to the pure parts of your code (which are hopefully the majority).

One small note:

How to check for validity without reimplementing the validation logic?

In addition to what @nick8325 says, I want to add it is often the case that certain invalid states are very easy to describe and check for with QuickCheck and it is often very worthwhile to even get that sort of partial coverage when devising all the necessary generators and structuring your code appropriately for checking full validity is more difficult and time-consuming. That is if you can't check your code always leaves things in a valid state, you can at least check that your code never leaves things in certain obvious invalid states.

In general when I've tested the state of a CRUD system, I've almost always mocked out the persistence layer with something like a Data.Map and then there's usually a baseline of general properties that are worthwhile to always check such as the following:

Your hotel overbooking scenario fits in nicely; if I swapped out my persistence backend for a simple [] of hotel room reservations and then just spewed requests at the system, are there any two entries in my list that have the same room and date?

bartavelle commented 7 years ago

As an illustration of what @changlinli said, here is code that does property testing against complex logic. It runs several games of 7startups, with players playing random cards, and checks that all games terminate without throwing.

The game logic is a free monad, allowing for several interpreters, including a pure one that is used for tests.

Even though this tests a complicated piece of logic by only checking it doesn't crash, a few bugs were found!

saurabhnanda commented 7 years ago

@changlinli thank you for your thoughtful reply 👍

String processing: Think of all the webpages that break on non-English names... What will my program do if one of my users in my database has accents in their name? Will my fancy product search bar be able to find products with non-English names? Don't worry! QuickCheck will throw strings thought up by Cthulhu himself at yo, I don't even care what my system returnsur program and see if anything topples over.

This seems interesting. Do you generate the raw JSON strings and send them over to the WAI endpoint, so that the JSON parsing is also tested, or do you generate the Haskell values that the JSON is supposed to be parsed to. That is, what do you generate: {"name": "random string", "email": "random string"} OR User{userName="random string", userEmail="random string"}

if I swapped out my persistence backend for a simple [] of hotel room reservations

How easy is it to do this in a large-scale webapp? Does this basically mean all database accesses need to go via some sort of app-specific DB library (which is written on top of the underlying DB library, i.e. Persistent, Opaleye, or PG-Simple), which can then be swapped out with an in-memory implementation during testing? Isn't this too much work?

stevana commented 7 years ago

I played around a bit and managed to get something that looks similar to Erlang quickcheck's state machine approach: https://gist.github.com/stevana/1595f3190c87220c59caaaf7c2aae04b .

It seems to me that the generator, shrink function, and the safety property are quite generic (they only use the pre-conditions, the model transition functions and the post-conditions). So I'd like to see how much of that code we can template Haskell away.

I also wonder if one can do away with the trace? I couldn't figure out how to show and shrink things like pop without it. Also if one wants to test for race conditions one will need something similar to the trace anyhow?

changlinli commented 7 years ago

@saurabhnanda I don't know how much more I want to post here since I feel like I'm hijacking GitHub issues (@nick8325 how do you feel about this thread? I'm happy to take this elsewhere if you don't want to have your issues gobbled up by this thread), but I'll reply one more time here.

With respect to raw string vs structured Haskell data type as the entry point for testing, I usually use the latter exclusively, but occasionally will add the former. Usually the process of going from strings to structured data is pretty robustly handled by something like aeson and I rarely find bugs in that part of the code. It's usually later on that I find some sort of mistake I've made (e.g. when normalizing strings for comparison does my normalization function work for all Unicode strings?).

[Mocking out the DB] Isn't this too much work?

Sometimes yes. QuickCheck is awesome, but not always a silver bullet. It depends on how you've structured your code. This isn't a passive-aggressive jab at code quality, there are reasonable worlds in which the pure part of your CUD pipeline is pretty trivially handled by third party libraries and the IO is the only interesting stuff you're writing (this is rarely true for the Read part; you usually have some interesting pure part of the Read pipeline that can be tested). That being said, here's some approaches to explore and keep in mind when designing your app to maximize its potential to be tested.

First, I'll just note that you are always making an "app-specific DB library" in your application simply by writing your CRUD operations. Indeed way to think about making application is that they're just a series of increasingly specific libraries with a very thin layer for interaction at the very end (the "almost everything is a library" intuition is especially strong in the Haskell community from what I've seen).

vs

    -- This is a lot harder to check
    readFromDB :: IO MyData
    readFromDB = undefined

    calculateTheFoo :: IO FooedData
    calculateTheFoo = undefined -- Do the Foo conversion internally

    doEverything :: IO BarredData
    doEverything = undefined -- Do the Bar conversion internally

The name of the game when it comes to IO and Haskell is "delay." Delay the IO as much as you can, either by encoding the IO into some sort of structure (Free monads), stubbing out the implementation (typeclasses), or just try to make your pure pipeline as large as possible.

nick8325 commented 7 years ago

@nick8325 how do you feel about this thread? I'm happy to take this elsewhere if you don't want to have your issues gobbled up by this thread

I'm very happy for people to have a discussion here, please carry on!

stevana commented 7 years ago

I had another go at a pre-/post-condition specification; this time of a memory cell: https://gist.github.com/stevana/3973198ab378995691c8b1a68369c492 .

The difference to my previous attempt is that there's no free monad nor trace, making the shrink function simpler and allowing us to use the standard forAll combinator. Both the generator and property seem less generic however...

stevana commented 7 years ago

Here's a version that also checks for race conditions: https://gist.github.com/stevana/a2bb4640d1552dd69abcf93a30d0814e .

nick8325 commented 7 years ago

Cool!

I had a go myself and came up with this: https://gist.github.com/nick8325/42a40c20be5527e22049e57a65f16aaf

It only does the basic eqc_statem stuff but the property and generator are generic (they directly call the example code in the gist but it should be easy to separate out).

stevana commented 7 years ago

Nice; I like your typed commands and that the post-condition returns a property rather than a boolean.

I'm not sure about using the state when generating commands though, as it makes your state transition function not be able to depend on the result...

I've put a cleaned up and generalised version of my last gist in it's own repo: https://github.com/advancedtelematic/quickcheck-state-machine-model . (The library is in the src directory and the memory cell example in the test directory.)

There are some of things I'd like to fix:

  1. When two News happen in parallel bad things can happen, because references/variables being integers end up pointing to the wrong thing in the shared environment. Using something like gensym to generate unique names would probably solve this, but it would make comparing programs harder (alpha-equality). Not sure if there's an easier way to make the environment "local". Solving this would make the last two properties in the tests pass, I think.

  2. Support for more than one type of reference/variable. I think I can see how commands can be generalised to be parametrised by a indexed family of variables in a dependently typed language, but I'm not sure if I could do that in Haskell...

  3. Pack up the functionality that the sequential and parallel property helpers need in a record or typeclass, perhaps deriving some of it using template Haskell. For example, returnsRef could probably be constructed from the GADT: Mem ref :: * -> * where New :: Mem ref ref.

  4. Currently only two parallel workers are used in the parallel property, some bugs might require more things to happen in parallel...

There's probably plenty more... In fact, please let me know if you can think of any other shortcomings!

saurabhnanda commented 7 years ago

@nick8325 @stevana I'm overwhelmed by the code snippets you've posted (newbie alert :) ). Are those actual implementations of the "state machine" in Quickcheck (core code), or demonstrations of how such testing can be done without tweaking the core of Quickcheck?

stevana commented 7 years ago

@saurabhnanda: The latter.

stevana commented 7 years ago

Small update:

I've stolen the well-typed commands and postconditions returning a Property rather than Bool ideas, thanks @nick8325!

I've also added the above four problems and some more stuff as proper issues, as I haven't figured out how to solve them yet. I think I'll focus on the multiple references/variables problem next.

saurabhnanda commented 7 years ago

I'm trying to write a small stateful test for my DB-backed system using the following strategy (which is taken from the quickcheck-state-machine package [1]:

i'm stuck at a very basic step. Every piece of data in the actual system's state is identified by a database/primary-key ID. If I have to compare the simplified-state with the actual-state I should be comparing data by these IDs. Does this mean that I have to execute a command on the actual system first, get the resulting IDs and then put them in the simplified-state? Is this the recommended way to do things?

This would also mean that the ADT describing the actions will be different for the actual-system and the simplified-system, i.e:

data ActualActions = CustomerSignup NewCustomer | CustomerActivation CustomerId

-- vs --

data SimplifiedActions = CustomerSignup_ NewCustomerAfterDBInsertion | CustomerActivation_ CustomerId

[1] Not using the quickcheck-state-machine package because I do not completely understand its types and API.

nick8325 commented 7 years ago

I can think of two ways to do it:

stevana commented 7 years ago

@saurabhnanda: You seem to have discovered one of the reasons for why the types are complicated in the quickcheck-state-machine library!

To solve the problem that you describe the library uses something we call references. References are things that you only get once you execute the actual system.

Using the quickcheck-state-machine library your type of actions would be something like:

data Action refs :: Signature () where
  CustomerSignup     :: NewCustomer -> Action refs ('Reference '())
  CustomerActivation :: refs @@ '() -> Action refs ('Response ())

The () in Signature () is there because we have one type of reference (customer ids). CustomerSignup takes NewCustomer as argument and returns a reference, hence the Reference '() part of Action refs ('Reference '()). CustomerActivation takes a reference as argument, refs @@ '() and returns a normal response (not a reference) of type (), hence the 'Response () part of Action refs ('Response ()).

If you have two types of references you'd use Signature Bool, and refs @@ 'True or refs @@ 'False to refer to the first or the second reference, etc.

Behind the scenes, the library does basically the second way that @nick8325 describes above. The benefit of letting the library take care of references is that we get generation/shrinking of references for free. The downside is, as you noted, that the types become more complicated. We are still discussing if the extra complexity is worth it...

nick8325 commented 7 years ago

To solve the problem that you describe the library uses something we call references. References are things that you only get once you execute the actual system.

data Action refs :: Signature () where
  CustomerSignup     :: NewCustomer -> Action refs ('Reference '())
  CustomerActivation :: refs @@ '() -> Action refs ('Response ())

In my approach this comes out simpler, not sure if this is something you considered @stevena:

data Command a where
  CustomerSignup :: NewCustomer -> Command CustomerId
  CustomerActivation :: Var CustomerId -> Command ()

The idea is that Var a represents a value which will only become known while executing the program. Each command in the test case binds a Var - the next-state function for a Command a is given a Var a for the result of the command (e.g. for CustomerSignup you get a Var CustomerId) so that later commands can refer to that Var.

The Vars internally are just integers so can easily be generated and pretty-printed without running the program:

data Var a = Var Int deriving Show

The function which executes commands is given an Env which allows it to get the value of any variable:

type Env = forall a. Typeable a => Var a -> a
stevana commented 7 years ago

@nick8325: I remember seeing this in your gist from a while ago!

I didn't quite like how your state transition function always gets a Var a rather than simply a (and you don't have access to Env in the state transition function). It seemed less expressive, and that's why we instead tag the return type with 'Reference and 'Response and treat the two differently, giving us the a for constructs that return a 'Response a.

That said, I've recently noticed that Erlang QuickCheck's state transition function also doesn't have access to a directly. This seems to have other advantages, e.g. you can advance the model while generating and shrinking. But also when linearising partial histories, you might not even get the a!

So yeah, that's something we have considered, and are reconsidering!

stevana commented 7 years ago

Also relevant: https://github.com/hedgehogqa/haskell-hedgehog/pull/89

jacobstanley commented 7 years ago

It's super exciting to see so much activity on state machine testing, it's been missing from Haskell's testing toolkit for far too long.

I thought it might be useful to give a quick overview of the approach I took in Hedgehog, linked by @stevana above. It certainly has a bit in common with the approaches already mentioned here.

Essentially, the model state and each command are parameterised by a functor which is either Const Var (i.e. Symbolic) or Identity (i.e. Concrete).

newtype Var =
  Var Int

data Symbolic a where
  Symbolic :: Typeable a => Var -> Symbolic a

newtype Concrete a where
  Concrete :: a -> Concrete a

So if we look at the register command from the process registry example, it might look like this:

data Pid v =
  Pid (v Int)

data Register v =
  Register Name (Pid v)

This means that we can traverse over a command and replace the symbolic values with concrete ones, provided we have an environment with a mapping of variables to values.

newtype Environment =
  Environment {
      unEnvironment :: Map Var Dynamic
    }

reify :: HTraversable t => Environment -> t Symbolic -> Either EnvironmentError (t Concrete)

In order for this to work, the commands need to implement HTraversable, currently this is manual but hopefully I can make it go away with some Template Haskell.

instance HTraversable Pid where
  htraverse f (Pid v) =
    fmap Pid (f v)

instance HTraversable Register where
  htraverse f (Register name pid) =
    Register
      <$> pure name
      <*> htraverse f pid

Within this system, the types of the functions which the user can implement for their various commands are as follows:

Command Generation

:: Monad m => state Symbolic -> Maybe (Gen m (input Symbolic))

Precondition

:: state Symbolic -> input Symbolic -> Bool

State Update (note that we use this during generation and execution)

:: Ord1 v => state v -> input v -> v output -> state v

Command Execution

:: Monad m => input Concrete -> Test m output

Postcondition

:: Monad m => state Concrete -> input Concrete -> output -> Test m ()

Currently I can only do sequential tests, but I think the design is close enough to the Erlang one that it should be easy enough to extend when I get around to it. I'm still trying to get my head around linearisation, the paper and articles linked from @stevana's readme have been super useful for this.

stevana commented 7 years ago

It's super exciting to see so much activity on state machine testing, it's been missing from Haskell's testing toolkit for far too long.

Agreed, and thanks for the explanation!

I really like the Concrete and Symbolic part, and I've already started stealing it in the following branch: https://github.com/advancedtelematic/quickcheck-state-machine/commits/refactor/simplify-refs .

I hope to add the parallel property helper to that same branch soon, meanwhile feel free to ask questions about the linearisation bit if you have any. Also, you might be interested in the following ticket, were we keep track of resources on how to improve the linearisation checker.

stevana commented 7 years ago

In quickcheck-state-machine-0.1.0, that we just released, we use the above symbolic/concrete style references.

@saurabhnanda: Your example can now be written as:

data Command (v :: * -> *) :: * -> * where
  CustomerSignup     :: NewCustomer -> Command v CustomerId
  CustomerActivation :: Reference v CustomerId -> Command v ()

Which, I hope you agree, is much simpler than before! Cheers @nick8325 and @jystic for the ideas!

stevana commented 7 years ago

@saurabhnanda: I just added a CRUD webserver example to the repo.

saurabhnanda commented 7 years ago

@saurabhnanda: I just added a CRUD webserver example to the repo.

@stevana We've started testing our (fairly complicated) web API. Referring to this line of code in transitions...

transitions (Model m) (PostUser   user) key = Model (m ++ [(Reference key, user)])

...how would it work if the API was returning a (UserId, User). Or, how about a ((UserId, User), [(ContactId, Contact)], [(StoredCardId, StoredCard)]) (essentially a user that has_many contacts and has_many stored_cards? Essentially the question that I think I'm asking is, how does the library know where in response an ID/PK is available?

stevana commented 7 years ago

how would it work if the API was returning a (UserId, User)

You'd need to change the response type of the PostUser action, e.g.:

 PostUser :: User -> Action v (Key User, User)

The contact and card stuff would be similar.

Essentially the question that I think I'm asking is, how does the library know where in response an ID/PK is available?

Not sure I understand the question...

(Key is something from the persistent library, it has nothing special to do with quickcheck-state-machine.)

saurabhnanda commented 7 years ago

@stevana in that case I don't really understand what Reference responseValue is, and how to use it in cases where the responseValue represents multiple entities in the DB.

stevana commented 7 years ago

When we use Keys, or any other response that are only known once the actual system is running, we need to wrap Reference around it, e.g.:

DeleteUser :: Reference v (Key User) -> Action v ()

The reason for this is because when we generate arbitrary actions we don't know how to generate sensible Keys (we would need to inspect the database to do that). So what Reference and its parameter v does is it lets us use symbolic references (essentially integers) during generation, and then switch to concrete ones (e.g. real Keys) during execution when we have access to the actual system.

So, an action that returns multiple Keys is represented as:

Apa   :: Action v (Key A, Key B)

While one that uses those keys is represented as:

Bepa :: Reference v (Key A) -> Reference v (Key B) -> Action v ()

Does that make sense?

MaximilianAlgehed commented 8 months ago

I'll wake this discussion up with a reference to a new library that does implement Quviq-style stateful shrinking in Haskell (we built it at Quviq). https://hackage.haskell.org/package/quickcheck-dynamic

MaximilianAlgehed commented 8 months ago

Whoops, accidentally closed the issue. I think a tutorial on this should be implemented in quickcheck-dynamic and we could point to it from here.

stevana commented 8 months ago

What a coincidence, I started playing with this again yesterday (after not having thought of this for years).

I'd like to see if one can design a library around @edsko 's lockstep-style where a fake is used instead of an explicit transitions function and pre- and post-conditions. I started with @nick8325 's last version from this thread, this is what I got right now (it's still work in progress).

I think it's clearly better to have a fake than the transition function that both quickcheck-state-machine and quickcheck-dynamic have, because the fake can be used in integration tests that depend on the SUT (that's what I'm trying to do with the QueueI interface). I know both libraries are compatible with the lockstep style, but it's not yet clear to me if things can be simplified if the lockstep style is taken as the default.