Closed Ismael-VC closed 8 years ago
I left you some questions before I merge this, but in general it looks cleaner. Happy to merge after discussion.
One problem with using Val{symbol}
for dispatch is it'll defeat type-stability since since different Symbol
values are producing completely different types. Maybe that's not much of an issue at the moment though, I've not looked at how type-stable things were before this change. (It does make things a bit cleaner though, which is nice.)
In addition to Michael's comments which are more important, a couple of minor comments:
@expect
as it's much easier to read test results.escape_exceptions
. It's more readable.escape_exceptions
seems like overkill. I'm not sure what the objection is to using an optional arg with an empty default here.The type stability/wrapper type issue is a potential performance problem, which seems to be at odds with your other changes.
I prefer Stage's
@expect
as it's much easier to read test results.
- It seems that Base.Test has improvements on the v0.5 branch, there are also several registered packages for unit testing, the most popular seems to be FactCheck an it does what you like about
Stage
I think that test verbosity is only useful when a test fails and kinda pointless when it succeeds. There is also JuliaTest and QuickCheck (haven't used the last two)
If we are to keep Stage, we should make it work with Pkg.test
, since this leaves the user to figure out and install Stage just to check the test suite, this doesn't happen in travis because it's hardcoded there.
Seems we can't REQUIRE unregistered packages, which is why I made this change, now users just have to do Pkg.test("LispSyntax")
and it works instead of getting Stage not defined
, understand the issue, search Stage, find Stage, clone it and re run tests.
I'd rather keep escape_exceptions. It's more readable.
- I'll keep
escape_exceptions
, I'm not sure what qualifies as overkill and what doesn't, but If you think it's too much, I'll change it back, I was actually just trying to avoid redundancy, we just need an optional argument, it doesn't mater if its positional or keyword, sincecodegen
is not part of the public API.escape_exceptions = escape_exceptions
all over the place, seemed to redundant to me. I'd also rather add documentation strings to all objects and explain args like:
"""
`codegen(s::Vector{Any}, esc_except = Set{Symbol}())`
Generates the blah blah........
Args:
* `s`: Input array of Any element types...
* `esc_except`: This is a set of symbols which indicates ...
"""
function codegen(s::Vector{Any}, esc_except = Set{Symbol}())
length(s) == 0 && return s # empty array
codegen(first(s), s, esc_except)
end
Produces:
I think this is better than make most variables have long names. Also I used esc_except::Set{Symbol}
in the signatures not because it is needed or because it makes the code faster/safer, but only for documentation purposes, it started to look redundant there also, but metaprogramming the codegen methods does seem like overkill.
prefer to keep the spacing at 2.
- I'd like to advice for 4 spaces per indentation level for the julia example code in the documetation as this is the convention in Julia, if you still prefer 2 spaces for LispSyntax.jl source code, I can change it back, but leave code in examples with 4 spaces is this ok? Also consider that new contributors would probably use also 4 spaces because of this.
- I'll try to address type stability, the previous version is also type unstable by the way, but dispatching to bits types at compile time is faster than selecting at runtime from a long if suite.
- http://docs.julialang.org/en/latest/manual/types/#value-types
As one application of these ideas, Julia includes a parametric type, Val{T}, designated for dispatching on bits-type values. For example, if you pass a boolean to a function, you have to test the value at run-time:
You can instead cause the conditional to be evaluated during function compilation by using the Val trick:
Tomorrow I'll implement whatever we agree here. ;)
I'd be inclined to leave things as 2-space indent everywhere in the project for the sake of consistency within the project itself. It also makes things a bit more difficult to review if there's whitespace changes included in real changes. (I do personally prefer 4-space, but it's not a huge burden to switch between the two I've found.)
Val{T}
is fine when it's being passed through the entire call chain (as in the manual example linked above), but with the method:
codegen(x::Symbol, s::Vector{Any}, esc_except::Set{Symbol}) = codegen(Val{x}, s, esc_except)
this will be type-unstable and result in runtime dynamic dispatch. See, for example, this simplified test:
stable(x, y) = x == :if ? y + 1 : y + 2
unstable(::Type{Val{:if}}, y) = y + 1
unstable(::Type, y) = y + 2
unstable(x, y) = unstable(Val{x}, y)
these produce the following llvm IR
define i64 @julia_stable_23555(%jl_value_t*, i64) #0 {
top:
%2 = icmp eq %jl_value_t* %0, inttoptr (i64 139744279971560 to %jl_value_t*)
br i1 %2, label %if, label %L
L: ; preds = %top
%3 = add i64 %1, 2
ret i64 %3
if: ; preds = %top
%4 = add i64 %1, 1
ret i64 %4
}
define i64 @julia_unstable_23561.2(%jl_value_t*, i64) #0 {
top:
%2 = call i8** @jl_get_ptls_states() #1
%3 = alloca [4 x %jl_value_t*], align 8
%.sub = getelementptr inbounds [4 x %jl_value_t*], [4 x %jl_value_t*]* %3, i64 0, i64 0
%4 = getelementptr [4 x %jl_value_t*], [4 x %jl_value_t*]* %3, i64 0, i64 2
%5 = bitcast [4 x %jl_value_t*]* %3 to i64*
store i64 4, i64* %5, align 8
%6 = getelementptr [4 x %jl_value_t*], [4 x %jl_value_t*]* %3, i64 0, i64 1
%7 = bitcast i8** %2 to i64*
%8 = load i64, i64* %7, align 8
%9 = bitcast %jl_value_t** %6 to i64*
store i64 %8, i64* %9, align 8
%jl_pgcstack1 = bitcast i8** %2 to %jl_value_t***
store %jl_value_t** %.sub, %jl_value_t*** %jl_pgcstack1, align 8
%10 = getelementptr [4 x %jl_value_t*], [4 x %jl_value_t*]* %3, i64 0, i64 3
store %jl_value_t* inttoptr (i64 139744292356880 to %jl_value_t*), %jl_value_t** %4, align 8
store %jl_value_t* %0, %jl_value_t** %10, align 8
%11 = call %jl_value_t* @jl_f_instantiate_type(%jl_value_t* null, %jl_value_t** %4, i32 2)
store %jl_value_t* %11, %jl_value_t** %4, align 8
%12 = call %jl_value_t* @jl_box_int64(i64 signext %1)
store %jl_value_t* %12, %jl_value_t** %10, align 8
%13 = call %jl_value_t* @jl_apply_generic(%jl_value_t* inttoptr (i64 139744338420432 to %jl_value_t*), %jl_value_t** %4, i32 2)
%14 = bitcast %jl_value_t* %13 to i64*
%15 = load i64, i64* %14, align 16
%16 = load i64, i64* %9, align 8
store i64 %16, i64* %7, align 8
ret i64 %15
}
and benchmarking those gives us
julia> @benchmark stable(:if, 1)
================ Benchmark Results ========================
Time per evaluation: 4.06 ns [3.97 ns, 4.14 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
Memory allocated: 0.00 bytes
Number of allocations: 0 allocations
Number of samples: 4001
Number of evaluations: 90501
R² of OLS model: 0.957
Time spent benchmarking: 2.20 s
julia> @benchmark unstable(:if, 1)
================ Benchmark Results ========================
Time per evaluation: 466.50 ns [453.65 ns, 479.35 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
Memory allocated: 0.00 bytes
Number of allocations: 0 allocations
Number of samples: 2401
Number of evaluations: 18901
R² of OLS model: 0.952
Time spent benchmarking: 1.35 s
Granted this is only for a single if
block, but getting a factor of 100 reduction won't happen with a couple more blocks I'd imagine.
Some possible solutions to this.
Val
approach as is. I've made extensive use of this kind of thing in Docile and it does make things much more readable, but that project never really needed to be focussing on performance.if
"dispatchers" at compile-time using using some kind of lookup table. I can try mock something up if there's interest in that.if
, defn
, etc into macros and separate the codegen into each definition. Not sure exactly how that would work, but it's probably possible.if
s as is.Michael, yes this PR has too many things inside it, and I've seen Wade has already taken care of many of the points listed here.
I agree with the 2 spaces aproach for the source code, I just wanted to try 4 spaces, because there is not tons of code.
Benchmarks is very good, I'll have to add it to my tool belt. Thanks for your careful review:
just got with the Val approach as is. I've made extensive use of this kind of thing in Docile and it does make things much more readable, but that project never really needed to be focussing on performance.
That is true, I only thought of introducing Val, for correctness, which includes, readability and performance, but I was wrong.
use a generated function to build the if "dispatchers" at compile-time using using some kind of lookup table. I can try mock something up if there's interest in that.
This seems to be the best aproach so far, please do mock up something when you find time, my meta foo is not that good and I haven't used staged functions yet.
make all the "keywords", such as if, defn, etc into macros and separate the codegen into each definition. Not sure exactly how that would work, but it's probably possible.
I think this could potentially be very complex.
keep the chain of ifs as is.
I think it's the best for now (this is this PR main focus), until someone gets a proven better aproach.
all, here's my suggestion:
if
-based symbol lookup until Michael has a chance to test. I think his dispatch approach via table lookup is probably the cleanest and most efficient solution. When we have a solution, let's make another PR associated with it.It sounds like we're all basically in agreement here. @Ismael-VC I made a couple of changes last night (bug fixes, reader updates). I think they impact your PR. Either you can re-pull or I can merge your existing changes after you finish cleanup and then fix.
@swadey don't worry, I'll rebase on top of current master, I'm working on it, It'll be ready in a moment.
Actually I should really be splitting this PR code refactor and unrelated changes, sorry for that, this was supposed to be a quick experiment with some minor unrelated changes added, silly me! I'll open a new PR.
I'll keep Base.Test
for the next PR aslo, FactCheck
is buggy:
julia> @fact 2*2 --> 4
Success :: (line:-1) :: fact was true
Expression: 2 * 2 --> 4
Expected: 4
Occurred: 4
But inside the tests it doesn't detect some things:
julia> @time Pkg.test("LispSyntax")
INFO: Testing LispSyntax
ERROR: LoadError: AssertionError: number == 3
In this case a simple assert caches the erro but not @fact
, snippet:
@fact number --> 3 # WTF!
@assert number == 3 # hint number should be 2 at this point
I'll open an issue at FactCheck, soo keeping Base.Test
it just works and it's also faster, using Test
:
julia> @time Pkg.test("LispSyntax")
INFO: Testing LispSyntax
ERROR: LoadError: test failed: 2 == 3
in expression: number == 3
in default_handler at test.jl:30
in do_test at test.jl:53
A generated dispatch approach could look something like the following:
using Base.Cartesian, Base.Meta
head{HEAD}(::Type{Type{Val{HEAD}}}) = HEAD
@generated function dispatch(s::Symbol, arg)
heads = [quot(head(Base.tuple_type_head(m.sig))) for m in methods(func)]
quote
@nif(
$(length(heads) + 1),
k -> s === $(heads)[k],
k -> func(Val{$(heads)[k]}, arg),
k -> default_func(arg)
)
end
end
# the following "mock" method defs have `@noinline` just to make looking at the code
# generated in `dispatch` easier. A real implementation wouldn't bother with it.
@noinline func(::Type{Val{:if}}, arg) = :if, arg
@noinline func(::Type{Val{:for}}, arg) = :for, arg
@noinline func(::Type{Val{:while}}, arg) = :while, arg
@noinline default_func(arg) = :default, arg
The generated IR for dispatch
will look something like:
define %jl_value_t* @julia_dispatch_24018(%jl_value_t*, i64) #0 {
top:
%2 = icmp eq %jl_value_t* %0, inttoptr (i64 140186963330792 to %jl_value_t*)
br i1 %2, label %if, label %L
L: ; preds = %top
%3 = icmp eq %jl_value_t* %0, inttoptr (i64 140186963330872 to %jl_value_t*)
br i1 %3, label %if3, label %L1
L1: ; preds = %L
%4 = icmp eq %jl_value_t* %0, inttoptr (i64 140186963331120 to %jl_value_t*)
br i1 %4, label %if5, label %L2
L2: ; preds = %L1
%5 = call %jl_value_t* @julia_default_func_23520(i64 %1) #0
ret %jl_value_t* %5
if: ; preds = %top
%6 = call %jl_value_t* @julia_func_23517(%jl_value_t* inttoptr (i64 140187029444144 to %jl_value_t*), i64 %1) #0
ret %jl_value_t* %6
if3: ; preds = %L
%7 = call %jl_value_t* @julia_func_23518(%jl_value_t* inttoptr (i64 140187029445872 to %jl_value_t*), i64 %1) #0
ret %jl_value_t* %7
if5: ; preds = %L1
%8 = call %jl_value_t* @julia_func_23519(%jl_value_t* inttoptr (i64 140187029447792 to %jl_value_t*), i64 %1) #0
ret %jl_value_t* %8
}
All the function calls are static here as far as I can see. It's basically equivalent to hand-written if
statements and benchmarking suggests that it's fast enough
julia> @benchmark dispatch(:if, 1)
================ Benchmark Results ========================
Time per evaluation: 20.17 ns [19.80 ns, 20.54 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
Memory allocated: 32.00 bytes
Number of allocations: 1 allocations
Number of samples: 4901
Number of evaluations: 213901
R² of OLS model: 0.956
Time spent benchmarking: 2.52 s
This refactor is meant to take advantage of multiple dispatch instead of using long
if
expressions to test the type of an argument.const
Base.Test
for unit testingpegparser.jl
Stage
cloning dep fromtravis.yml
λ
for creating anonymous funcss_expr
toSExpr
esc_exceptions
an optional positional arg instead of an optional kwarg in order to use it for dispatchthrow
calls toerror