whatyouhide / stream_data

Data generation and property-based testing for Elixir. 🔮
https://hexdocs.pm/stream_data
883 stars 66 forks source link

Consider name change: "check all -> something else" (Cosmetic) #21

Closed tmbb closed 7 years ago

tmbb commented 7 years ago

I'm not on a mission to make StreamData equal to Hypothesis, but I do consider they have some nice names (hypothesis, conjecture, example, given, etc.). What follows will be extreme bikeshedding. You've been warned.

Currently the main API endpoint for the PropertyTesting module seems to be the the check macro. It's used like this:

check all int1 <- int(),
          int2 <- int(),
          int1 > 0 and int2 > 0,
          sum = int1 + int2 do
  assert sum > int1
  assert sum > int2
end

In my opinion (and this is 100% subjective), the words check all are a little awkward to read... They seem to try to be idiomatic English but fail (disclaimer: I'm not a native English speaker). I suggest something like:

#  A little better, uses a quantifier commonly used in mathematical logic
check forall int1 <- int(),
             int2 <- int(),
             int1 > 0 and int2 > 0,
             sum = int1 + int2 do
  assert sum > int1
  assert sum > int2
end

Or this one, which drops the "check" that doesn't seem to be doing much:

forall int1 <- int(),
       int2 <- int(),
       int1 > 0 and int2 > 0,
       sum = int1 + int2 do
  assert sum > int1
  assert sum > int2
end

Or this one (stolen from Hypothesis - "Good artists copy, great artists steal")

given int1 <- int(),
      int2 <- int(),
      int1 > 0 and int2 > 0,
      sum = int1 + int2 do
  assert sum > int1
  assert sum > int2
end

The last two one read like a mathematical proof: "Given these variables satisfying these conditions do some tests". Or "Forall variables satisfying these conditions do some tests".

Now, advantages and disadvantages:

Given

It reads like normal English, which is an advantage (although it can be taken too far!).

The problem is that I feel like it's missing a "such that" and a "let". Let's "read" the test above:

Given integer int1 and integer int2,int1 > 0andint2 > 0andsum = int1 + int2, assert thatsum > int1andsum > int2`.

It would be more natural to have:

Given integer int1 and integer int2, **such that**int1 > 0andint2> 0, let sum = int1 + int2, assert that sum > int1 and sum > int2.

This can't be done with advanced macro magic, and for that reason it probably shouldn't be done.

Forall

It is not even an English word, so that's a disadvantage. On the other hand, neither are def or defmacro. But it's the name of a symbol used in mathematical logic, and it makes the test look like a mathematical proof:

Forall integers int1 and int2, int1 > 0 and int2 > 0, sum = in1 + int2, assert that sum > int1 and sum > int2.

Although this isn't actually proper English (I think it's still missing a let before sum = int1 + int2), it's similar to how some of these proofs are actually written.


I'm really undecided on which one of these two is the best, but both of them seem to be better than the current check all. In Hypothesis, given is clearly the right choice, but that's because hypothesis is more limited with what it can do with the decorators (the filtering logic must be written in the body of the test). Because check all is more powerful (it generates, filters and defines new variables), it's harder to find a word that fits so well. Therefore, something like forall (used by PropCheck) is probably more appropriate.

I'm just leaving this here for discussion.

whatyouhide commented 7 years ago

We cannot use forall or given or any other single word directly because of how Elixir works, that is, <- and other clauses are passed as arguments to the macro (forall or given in this case) but they can be variable in number, but we can't define macros that take an unknown number of arguments. This is why we have to use two words, because the first one will take a fixed single argument (the AST for all ...).

As for check forall, I believe it's more awkward to read because of the check being one word and forall being two. check all was used because it's saying "check all these values against the given property".

josevalim commented 7 years ago

Yup. We are not married to "check all" but it needs to be two words for the reasons above and it cannot be "for all" because "for" is already a special form.

tmbb commented 7 years ago

We cannot use forall or given or any other single word directly because of how Elixir works

Ok, my macro skills are pretty lame. I had taken a very cursory look at the coded and didn't realize you were using all ... like a function call. I though you were using two words for some other reason.

I can't think of two words that are that much better than check all or check forall. Depending on how hardcore you want to get with the whole DSL thing, you can have something like:

check forall clause1, clause2, clause3, do: ...
# which morphs into
check_if forall clause1, clause2, clause3, do: ...
# why do we need exactly 2 words?
check if forall clause1, clause2, clause3, do: ...
# Well, we already have 3, why not 4?
check if for all clause1, clause2, clause3, do: ...
# This last form compiles to:
check (if (for (all clause1, clause2, clause3, do: ...)))
# Which gives you this AST:
{:check, [],
 [{:if, [],
   [{:for, [],
     [{:all, [],
       [{:clause1, [], Elixir}, {:clause2, [], Elixir}, {:clause3, [], Elixir}, [do: ...]]}]}]}]}
# From which you can extract the final do block

While this can be done, I don't know if it should be done (probably not).

whatyouhide commented 7 years ago

I am not even too happy with two words so I will give the strongest possible 👎 on more than two words 😄. If we can find a significant improvement over check all which is not based on personal taste then we can definitely switch to it, otherwise for now let's keep it as is.

tmbb commented 7 years ago

My problem with check all is that (to me) it seems to lie in the uncanny value of "sufficiently English-like to sound like English" and "too different from normal English to be idiomatic".

That's also my criticism of do ... end blocks (yes, superficially it does look kinda sorta like English, but you know it isn't), but that's a rant for another day 😜

tmbb commented 7 years ago

Why not wrap the clauses in a list?

forall [int1 <- int(),
        int2 <- int(),
        int1 > 0 and int2 > 0,
        sum = int1 + int2] do
  assert sum > int1
  assert sum > int2
end

I don't think the list adds much visual noise, and in any case the useless all keyword adds some noise of its own?

whatyouhide commented 7 years ago

We discussed using a list before but came to the conclusion that it looks even weirder than two words. Using a list feels weird because for and with don't need a list so it feels more like an inconsistency than using check all.

whatyouhide commented 7 years ago

I'm gonna close this for now, if someone comes up with a good name please don't hesitate to suggest it :). Thanks all for the discussion! 💟

christhekeele commented 7 years ago

I've been kicking the naming of this around in the back of my head since ElixirConf and now that I've started working with some data streams I wanted to contribute my thoughts.

I think my ideal nomenclature while keeping the current tokenization would be:

property "reversing a list doesn't change its length" do
  test data list <- list_of(integer()) do
    assert length(list) == length(:lists.reverse(list))
  end
end

check makes sense for historical reasons, but the all feels adrift and pointless, although I know it has syntactic purpose. Semantically, "for this property, check that all generated things from a list data stream holds by making these assertions..." doesn't really flow very well.

The proposed version reads: "For this property, I want to test if it holds for thislist data by asserting such-and-such."

Keeping test lends a certain congruence with my existing suite. It helps me quickly scan through property and example tests since I'm looking for the same keyword; navigating through module, function, macro, property, describe, and callback blocks; honing in on my assertions.

The downside is we'd have to mandate that property checks only be made in property blocks to use the correct macro. With my usage, that would be fine, but I don't know if other people prefer to wrap their checks in test blocks instead.

Using the word data as the filler function transitions conceptually from the notion of a defining a test to specifying streams of data that come immediately after, which serves more purpose than all. Even if we keep check, I still think check data list <- list_of is the way to go. Similarly think gen all is okay but gen data list <- list_of is more descriptive.

Sorry to awaken this thread from the dead, though I suppose tis the season! 👻 Despite bikeshedding, I'm super excited by this, and the blog post on its internals ranks up there with the continuable enumerators one for coolest Elixir articles. Superb effort! 😍

tmbb commented 7 years ago

I think that a macro name that conflicts with ExUnit's test is not a good idea, even if it is meant to be used inside property.

Personally I don't think test data is any better than check all, but this part is obviously very subjective.

But since we've brought this topic from the dead, let me suggest another name, which has occurred to me just now:

property "reversing a list doesn't change its length" do
  holds for list <- list_of(integer()) do
    assert length(list) == length(:lists.reverse(list))
  end
end

It reads as "property X holds for list of type integer". The do tokens kinda break the flow, but as José Valim says, one doesn't actually notice them after a while.

It's just a suggestion, and again, this is 90% subjective.

josevalim commented 7 years ago

Thanks for exploring alternatives folks! The bikeshedding in this case is welcome.

I am happy with those alternatives but not being a native speaker really limits me in judging if it is better or worse than check all. So I would like to invite more folks to join the discussion. /cc @jeg2 @batate @fishcakez @chrismccord

tmbb commented 7 years ago

For the record, I am not a native speaker either.

christhekeele commented 7 years ago

Y'know, I think holds for scans really well, better than check data, although I can't think of a great for analog for gen. I don't know if we want the filler word to be consistent between the two but I'd imagine so.

fishcakez commented 7 years ago

given any is probably closet in meaning to for all as a two word pair.

tmbb commented 7 years ago

Y'know, I think holds for scans really well, better than check data, although I can't think of a great for analog for gen

Having holds for/given any together with gen all doesn't bother me.

Also, given any is pretty good! Probably better than holds for because of the do keyword. We're falling into the trap of trying to make code read like English, but I think it's ok because we're being forced by the syntax.

PS: In fact, I might just steal given any for my competing property-based test implementation if I decide to get rid of the list around my generators.

tmbb commented 7 years ago

A problem I didn't notice about holds for because the code was not highlighted. The for keyword will stand out in a weird way:

property "reversing a list doesn't change its length" do
  holds for list <- list_of(integer()) do
    assert length(list) == length(:lists.reverse(list))
  end
end

Currently no syntax highlighter is "clever" enough to render this for as a normal identifier and not a keyword. So I'm in favor of given any now.

christhekeele commented 7 years ago

I think it's important to have the filler word be consistent, so that the relative rarity of the syntax appears like a specific special DSL case, rather than causing newcomers to think that Elixir is like SQL and there are a bunch of two-word keywords out there like ORDER BY, for .. in, or else if.

given any is very good. Also, gen any works pretty well as its mate, it produces any one of the possible values in the given "range" of possibilities.

JEG2 commented 7 years ago

I'll only add two small thoughts.

First, as a native speaker, I think "check all" is OK English. We do say things like, "Check all the windows."

Then all just throw in on the side of given any. I like how the any hints at the usage of any old integer, string, or whatever. all feels wrong by contrast because a property test does actually try all possible values. holds for sounds good, but I worry the extra meaning of for could be a point of confusion against list comprehensions.

keathley commented 7 years ago

I'll also say that I'm OK with check all. It is descriptive and, at least to me, reads naturally.

I'm somewhat on the fence about given any. given seems nice but I tend to prefer all because it indicates that the invariant should hold for all of the given inputs. When talking about properties, I would typically say things like "adding is commutative for all integers" or "reversing a list twice retains ordering for all lists" or "The system models these semantics for all valid command sequences". Replacing "all" with "any" in each of those statements would imply that they're true for at least 1 input but not necessarily for all of them. This is also in line with Elixir's existing semantics for Enum.all?/2 and Enum.any?/2. I realize that this is probably a totally pedantic argument, but I just don't think any fits well here.

tmbb commented 7 years ago

First, as a native speaker, I think "check all" is OK English.

For what it's worth (again, not a native speaker), "check all" is definitely OK English, and I never meant to say it isn't when I raised this issue. My "problem" with "check all" is that, to me, you don't check all integers, you check whether the property holds for all integers. Actually, when I raised the issue I thought check all could be replaced by a single word, and if I had known that it could never be a single word I wouldn't probably have raised it...

all feels wrong by contrast because a property test does actually try all possible values.

I think you mean doesn't actually try right?

From that perspective, all is not that bad, because even if you don't test all cases, the property is supposed to hold for all cases. You don't check all cases, though...

tmbb commented 7 years ago

When talking about properties, I would typically say things like "adding is commutative for all integers" or "reversing a list twice retains ordering for all lists" or "The system models these semantics for all valid command sequences". Replacing "all" with "any" in each of those statements would imply that they're true for at least 1 input but not necessarily for all of them.

Really? "Reversing a list twice retains ordering for any list" seems to be equivalent to "Reversing a list twice retains ordering for all lists". When "any" is used in a declarative sentence (i.e. not a question) it has the same meaning of "all". When it is used in a question, it means indeed "at least one": "Is there any list such that reverting it twice retains the ordering?".

Elixir actually gets this right because of the question mark:

Python, for example doesn't have a question mark (any and all), which is lends itself to the interpretation for the declarative sentence. What does any(list) mean:

This is an example on why making code read as English is horrible and misguided, but here we absolutely need to have two words due to syntactic constraints, so it's more of a harm minimization endeavor...

I realize that this is probably a totally pedantic argument, but I just don't think any fits well here.

All arguments here are very pedantic, starting with mine. Don't feel bad xD

JEG2 commented 7 years ago

I think you mean doesn't actually try right?

Oops. Yes, I did.

chrismccord commented 7 years ago

Of the discussed terms, I prefer the existing check all. I actually feel that given any and holds for align better with traditional property based testing terminology, but I think they make the ideas a little bit more opaque for folks coming in. check all satisfies what is being accomplished in the plainest, least loaded terminology in my opinion, especially when considering folks coming into to property testing for the first time.

keathley commented 7 years ago

@tmbb I take your point. I'm probably colored by using forall for such a long time. given any looks a lot like for any and using for any here would be 100% incorrect. But I also suck at english though despite speaking it my whole life 😈 .

fishcakez commented 7 years ago

check all satisfies what is being accomplished in the plainest, least loaded terminology in my opinion, especially when considering folks coming into to property testing for the first time.

This great point from @chrismccord. And then from @JEG2:

all feels wrong by contrast because a property test does actually try all possible values.

I would almost like check sample or check some, but a better second word.

whatyouhide commented 7 years ago

Is “check many” dumb?

On Mon, 23 Oct 2017 at 21:07, James Fish notifications@github.com wrote:

check all satisfies what is being accomplished in the plainest, least loaded terminology in my opinion, especially when considering folks coming into to property testing for the first time.

This great point from @chrismccord https://github.com/chrismccord. And then from @JEG2 https://github.com/jeg2:

all feels wrong by contrast because a property test does actually try all possible values.

I would almost like check sample or check some, but a better second word.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/whatyouhide/stream_data/issues/21#issuecomment-338764512, or mute the thread https://github.com/notifications/unsubscribe-auth/ADtcSmeBACqoWSkQVy0_-0ujK1i70l4fks5svOPwgaJpZM4O1r-V .

--

Andrea Leopardi an.leopardi@gmail.com

tmbb commented 7 years ago

Is “check many” dumb?

Yes but I like it even though it sounds a little but dumb and almost childish. Or maybe because of it. "Check many" sounds really good. It sounds like "generate MANY examples and turn on the heat until they talk".

For the generators, gen many sounds even better than gen all, because even tough you migh generate all possible examples, in practice space and time considerations restrict you to a finite subset. And the generators aren't even optimized to generate all examples, they are extremely biased toward small examples.

So actually I like check many and gen many.

keathley commented 7 years ago

I still prefer check all. I realize that not every possible value is tested during every test run. But the point is that the invariants should hold for all values in the specified types and I'm in favor of using language that indicates that.

josevalim commented 7 years ago

My take away from this discussion is that there isn't something largely better than check all. It seems we do not even have something slightly better that we all agree on. @chrismccord brought more good reasons for check all and, besides all of the points brought up on this thread, it is nice to keep at least the "check" part of the original quickcheck paper, which is the reason why we are all here. :) So my vote is to keep it as "check all" indeed.

It was important to have this discussion as we now feel confident to move forward.

Thanks everyone.