swadey / LispSyntax.jl

lisp-like syntax in julia
Other
229 stars 24 forks source link

RFC: Code refactoring. #11

Closed Ismael-VC closed 8 years ago

Ismael-VC commented 8 years ago

This refactor is meant to take advantage of multiple dispatch instead of using long if expressions to test the type of an argument.

swadey commented 8 years ago

I left you some questions before I merge this, but in general it looks cleaner. Happy to merge after discussion.

MichaelHatherly commented 8 years ago

One problem with using Val{symbol} for dispatch is it'll defeat type-stability since since different Symbol values are producing completely different types. Maybe that's not much of an issue at the moment though, I've not looked at how type-stable things were before this change. (It does make things a bit cleaner though, which is nice.)

swadey commented 8 years ago

In addition to Michael's comments which are more important, a couple of minor comments:

  1. I prefer Stage's @expect as it's much easier to read test results.
  2. I'd rather keep escape_exceptions. It's more readable.
  3. dispatch for escape_exceptions seems like overkill. I'm not sure what the objection is to using an optional arg with an empty default here.
  4. prefer to keep the spacing at 2.

The type stability/wrapper type issue is a potential performance problem, which seems to be at odds with your other changes.

Ismael-VC commented 8 years ago

I prefer Stage's @expect as it's much easier to read test results.

  • It seems that Base.Test has improvements on the v0.5 branch, there are also several registered packages for unit testing, the most popular seems to be FactCheck an it does what you like about Stage I think that test verbosity is only useful when a test fails and kinda pointless when it succeeds. There is also JuliaTest and QuickCheck (haven't used the last two)

If we are to keep Stage, we should make it work with Pkg.test, since this leaves the user to figure out and install Stage just to check the test suite, this doesn't happen in travis because it's hardcoded there.

Seems we can't REQUIRE unregistered packages, which is why I made this change, now users just have to do Pkg.test("LispSyntax") and it works instead of getting Stage not defined, understand the issue, search Stage, find Stage, clone it and re run tests.

I'd rather keep escape_exceptions. It's more readable.

  • I'll keep escape_exceptions, I'm not sure what qualifies as overkill and what doesn't, but If you think it's too much, I'll change it back, I was actually just trying to avoid redundancy, we just need an optional argument, it doesn't mater if its positional or keyword, since codegen is not part of the public API. escape_exceptions = escape_exceptions all over the place, seemed to redundant to me. I'd also rather add documentation strings to all objects and explain args like:
"""
`codegen(s::Vector{Any}, esc_except = Set{Symbol}())`

Generates the blah blah........

Args:

* `s`: Input array of Any element types...
* `esc_except`: This is a set of symbols which indicates ...
"""
function codegen(s::Vector{Any}, esc_except = Set{Symbol}())
    length(s) == 0 && return s    # empty array
    codegen(first(s), s, esc_except)
end

Produces:

I think this is better than make most variables have long names. Also I used esc_except::Set{Symbol} in the signatures not because it is needed or because it makes the code faster/safer, but only for documentation purposes, it started to look redundant there also, but metaprogramming the codegen methods does seem like overkill.

prefer to keep the spacing at 2.

  • I'd like to advice for 4 spaces per indentation level for the julia example code in the documetation as this is the convention in Julia, if you still prefer 2 spaces for LispSyntax.jl source code, I can change it back, but leave code in examples with 4 spaces is this ok? Also consider that new contributors would probably use also 4 spaces because of this.
  • I'll try to address type stability, the previous version is also type unstable by the way, but dispatching to bits types at compile time is faster than selecting at runtime from a long if suite.
  • http://docs.julialang.org/en/latest/manual/types/#value-types

As one application of these ideas, Julia includes a parametric type, Val{T}, designated for dispatching on bits-type values. For example, if you pass a boolean to a function, you have to test the value at run-time:

You can instead cause the conditional to be evaluated during function compilation by using the Val trick:

Tomorrow I'll implement whatever we agree here. ;)

MichaelHatherly commented 8 years ago

I'd be inclined to leave things as 2-space indent everywhere in the project for the sake of consistency within the project itself. It also makes things a bit more difficult to review if there's whitespace changes included in real changes. (I do personally prefer 4-space, but it's not a huge burden to switch between the two I've found.)

Val{T} is fine when it's being passed through the entire call chain (as in the manual example linked above), but with the method:

codegen(x::Symbol, s::Vector{Any}, esc_except::Set{Symbol}) = codegen(Val{x}, s, esc_except)

this will be type-unstable and result in runtime dynamic dispatch. See, for example, this simplified test:

stable(x, y) = x == :if ? y + 1 : y + 2

unstable(::Type{Val{:if}}, y) = y + 1
unstable(::Type, y)           = y + 2
unstable(x, y)                = unstable(Val{x}, y)

these produce the following llvm IR

define i64 @julia_stable_23555(%jl_value_t*, i64) #0 {
top:
  %2 = icmp eq %jl_value_t* %0, inttoptr (i64 139744279971560 to %jl_value_t*)
  br i1 %2, label %if, label %L

L:                                                ; preds = %top
  %3 = add i64 %1, 2
  ret i64 %3

if:                                               ; preds = %top
  %4 = add i64 %1, 1
  ret i64 %4
}
define i64 @julia_unstable_23561.2(%jl_value_t*, i64) #0 {
top:
  %2 = call i8** @jl_get_ptls_states() #1
  %3 = alloca [4 x %jl_value_t*], align 8
  %.sub = getelementptr inbounds [4 x %jl_value_t*], [4 x %jl_value_t*]* %3, i64 0, i64 0
  %4 = getelementptr [4 x %jl_value_t*], [4 x %jl_value_t*]* %3, i64 0, i64 2
  %5 = bitcast [4 x %jl_value_t*]* %3 to i64*
  store i64 4, i64* %5, align 8
  %6 = getelementptr [4 x %jl_value_t*], [4 x %jl_value_t*]* %3, i64 0, i64 1
  %7 = bitcast i8** %2 to i64*
  %8 = load i64, i64* %7, align 8
  %9 = bitcast %jl_value_t** %6 to i64*
  store i64 %8, i64* %9, align 8
  %jl_pgcstack1 = bitcast i8** %2 to %jl_value_t***
  store %jl_value_t** %.sub, %jl_value_t*** %jl_pgcstack1, align 8
  %10 = getelementptr [4 x %jl_value_t*], [4 x %jl_value_t*]* %3, i64 0, i64 3
  store %jl_value_t* inttoptr (i64 139744292356880 to %jl_value_t*), %jl_value_t** %4, align 8
  store %jl_value_t* %0, %jl_value_t** %10, align 8
  %11 = call %jl_value_t* @jl_f_instantiate_type(%jl_value_t* null, %jl_value_t** %4, i32 2)
  store %jl_value_t* %11, %jl_value_t** %4, align 8
  %12 = call %jl_value_t* @jl_box_int64(i64 signext %1)
  store %jl_value_t* %12, %jl_value_t** %10, align 8
  %13 = call %jl_value_t* @jl_apply_generic(%jl_value_t* inttoptr (i64 139744338420432 to %jl_value_t*), %jl_value_t** %4, i32 2)
  %14 = bitcast %jl_value_t* %13 to i64*
  %15 = load i64, i64* %14, align 16
  %16 = load i64, i64* %9, align 8
  store i64 %16, i64* %7, align 8
  ret i64 %15
}

and benchmarking those gives us

julia> @benchmark stable(:if, 1)
================ Benchmark Results ========================
     Time per evaluation: 4.06 ns [3.97 ns, 4.14 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 4001
   Number of evaluations: 90501
         R² of OLS model: 0.957
 Time spent benchmarking: 2.20 s

julia> @benchmark unstable(:if, 1)
================ Benchmark Results ========================
     Time per evaluation: 466.50 ns [453.65 ns, 479.35 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 2401
   Number of evaluations: 18901
         R² of OLS model: 0.952
 Time spent benchmarking: 1.35 s

Granted this is only for a single if block, but getting a factor of 100 reduction won't happen with a couple more blocks I'd imagine.

Some possible solutions to this.

Ismael-VC commented 8 years ago

Michael, yes this PR has too many things inside it, and I've seen Wade has already taken care of many of the points listed here.

I agree with the 2 spaces aproach for the source code, I just wanted to try 4 spaces, because there is not tons of code.

Benchmarks is very good, I'll have to add it to my tool belt. Thanks for your careful review:

just got with the Val approach as is. I've made extensive use of this kind of thing in Docile and it does make things much more readable, but that project never really needed to be focussing on performance.

That is true, I only thought of introducing Val, for correctness, which includes, readability and performance, but I was wrong.

use a generated function to build the if "dispatchers" at compile-time using using some kind of lookup table. I can try mock something up if there's interest in that.

This seems to be the best aproach so far, please do mock up something when you find time, my meta foo is not that good and I haven't used staged functions yet.

make all the "keywords", such as if, defn, etc into macros and separate the codegen into each definition. Not sure exactly how that would work, but it's probably possible.

I think this could potentially be very complex.

keep the chain of ifs as is.

I think it's the best for now (this is this PR main focus), until someone gets a proven better aproach.

swadey commented 8 years ago

all, here's my suggestion:

  1. Let's merge the type dispatch changes now as that is definitely a win. I ran a couple of benchmarks it's at least 3-4x faster.
  2. Let's keep the current if-based symbol lookup until Michael has a chance to test. I think his dispatch approach via table lookup is probably the cleanest and most efficient solution. When we have a solution, let's make another PR associated with it.

It sounds like we're all basically in agreement here. @Ismael-VC I made a couple of changes last night (bug fixes, reader updates). I think they impact your PR. Either you can re-pull or I can merge your existing changes after you finish cleanup and then fix.

Ismael-VC commented 8 years ago

@swadey don't worry, I'll rebase on top of current master, I'm working on it, It'll be ready in a moment.

Ismael-VC commented 8 years ago

Actually I should really be splitting this PR code refactor and unrelated changes, sorry for that, this was supposed to be a quick experiment with some minor unrelated changes added, silly me! I'll open a new PR.

Ismael-VC commented 8 years ago

I'll keep Base.Test for the next PR aslo, FactCheck is buggy:

julia> @fact 2*2 --> 4
  Success :: (line:-1) :: fact was true
    Expression: 2 * 2 --> 4
      Expected: 4
      Occurred: 4

But inside the tests it doesn't detect some things:

julia> @time Pkg.test("LispSyntax")
INFO: Testing LispSyntax
ERROR: LoadError: AssertionError: number == 3

In this case a simple assert caches the erro but not @fact, snippet:

@fact number --> 3   # WTF!
@assert number == 3    # hint number should be 2 at this point

I'll open an issue at FactCheck, soo keeping Base.Test it just works and it's also faster, using Test:

julia> @time Pkg.test("LispSyntax")
INFO: Testing LispSyntax
ERROR: LoadError: test failed: 2 == 3
 in expression: number == 3
 in default_handler at test.jl:30
 in do_test at test.jl:53
MichaelHatherly commented 8 years ago

A generated dispatch approach could look something like the following:

using Base.Cartesian, Base.Meta

head{HEAD}(::Type{Type{Val{HEAD}}}) = HEAD

@generated function dispatch(s::Symbol, arg)
    heads = [quot(head(Base.tuple_type_head(m.sig))) for m in methods(func)]
    quote
        @nif(
            $(length(heads) + 1),
            k -> s === $(heads)[k],
            k -> func(Val{$(heads)[k]}, arg),
            k -> default_func(arg)
        )
    end
end

# the following "mock" method defs have `@noinline` just to make looking at the code
# generated in `dispatch` easier. A real implementation wouldn't bother with it.

@noinline func(::Type{Val{:if}},    arg) = :if, arg
@noinline func(::Type{Val{:for}},   arg) = :for, arg
@noinline func(::Type{Val{:while}}, arg) = :while, arg

@noinline default_func(arg) = :default, arg

The generated IR for dispatch will look something like:

define %jl_value_t* @julia_dispatch_24018(%jl_value_t*, i64) #0 {
top:
  %2 = icmp eq %jl_value_t* %0, inttoptr (i64 140186963330792 to %jl_value_t*)
  br i1 %2, label %if, label %L

L:                                                ; preds = %top
  %3 = icmp eq %jl_value_t* %0, inttoptr (i64 140186963330872 to %jl_value_t*)
  br i1 %3, label %if3, label %L1

L1:                                               ; preds = %L
  %4 = icmp eq %jl_value_t* %0, inttoptr (i64 140186963331120 to %jl_value_t*)
  br i1 %4, label %if5, label %L2

L2:                                               ; preds = %L1
  %5 = call %jl_value_t* @julia_default_func_23520(i64 %1) #0
  ret %jl_value_t* %5

if:                                               ; preds = %top
  %6 = call %jl_value_t* @julia_func_23517(%jl_value_t* inttoptr (i64 140187029444144 to %jl_value_t*), i64 %1) #0
  ret %jl_value_t* %6

if3:                                              ; preds = %L
  %7 = call %jl_value_t* @julia_func_23518(%jl_value_t* inttoptr (i64 140187029445872 to %jl_value_t*), i64 %1) #0
  ret %jl_value_t* %7

if5:                                              ; preds = %L1
  %8 = call %jl_value_t* @julia_func_23519(%jl_value_t* inttoptr (i64 140187029447792 to %jl_value_t*), i64 %1) #0
  ret %jl_value_t* %8
}

All the function calls are static here as far as I can see. It's basically equivalent to hand-written if statements and benchmarking suggests that it's fast enough

julia> @benchmark dispatch(:if, 1)
================ Benchmark Results ========================
     Time per evaluation: 20.17 ns [19.80 ns, 20.54 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 32.00 bytes
   Number of allocations: 1 allocations
       Number of samples: 4901
   Number of evaluations: 213901
         R² of OLS model: 0.956
 Time spent benchmarking: 2.52 s