jcrist / Juniper.jl

A simple computer algebra system, in Julia
Other
5 stars 0 forks source link

Using parametric types for Symbolic? #1

Closed Upabjojr closed 9 years ago

Upabjojr commented 9 years ago

Hi, I have noticed that you define the type of Symbolic by an Int8 parameter (not UInt8 ?).

What about a parametric type instead? That is, something like Symbolic{:Add}(args...).

This way you can use multiple dispatch easily on the node types. Mathematica uses "Plus", "Times", "Power" in its M-expressions, and it doesn't seem to be a noticeable performance loss over an Int8.

Upabjojr commented 9 years ago

Furthermore, multiple dispatch should try to do compile-time optimizations whenever the method can be unambiguously determined by type-inference.

Upabjojr commented 9 years ago

Another thought, 32 bit architectures work on 4 byte chunks, 64 bit archs on 8 byte chunks. That is, if :Add, :Mul, :Pow are represented as zero-terminating strings, they need 4 bytes of memory. Would they have both in 32 and 64 bit architectures have the same performance of an Int8 field?

cdsousa commented 9 years ago

:+1: for trying parametric types. There will be no performance loss usually, I guess. And there can be some performance gain when the dispatch can be done statically.

Such technique will make Julia dispatching system shine once again. (I always wonder why it is not used in the Expr type in the first place.)

Upabjojr commented 9 years ago

There is another project about symbolic math in Julia: https://github.com/jlapeyre/SJulia

That looks more like a clone of Wolfram Mathematica. In SJulia there is a base object called Mxpr, which is equivalent to Mathematica's M-expressions. The first argument of the M-expression is put parametrically in Mxpr, e.g.

x*y + z in Mathematica is represented by Plus[Times[x, y], z]

In SJulia it becomes mxpr(:Plus, mxpr(:Times, :x, :y), :z), where mxpr(:T, args...) is the type constructor of Mxpr{:T}.

The thing I don't like of Mathematica is its lack of support for mixed commutative-noncommutative elements in a multiplication, SJulia has the same problem.

jcrist commented 9 years ago

What about a parametric type instead? That is, something like Symbolic{:Add}(args...).

This approach is what I originally was using, but switched due to performance issues. Multiple dispatch is great for composability, and can result in fast code when the concrete type of the object is known at compile time. The reason such an approach is slower is because of the need to store arbitrary expressions in the args vector. WIth the current design, this is a vector of concrete type (all objects are Symbolic). In contrast, using a parametric type, each element of the vector is abstact, and not known at compile time. This forces the dispatch system to do inference and method lookup in the runtime, resulting in seriously degraded performance.

There is another project about symbolic math in Julia: https://github.com/jlapeyre/SJulia

Yeah, I responded to that thread on julia-users a while ago. I don't care for his approach of storing everything in Expr, modifying the REPL, etc... Mainly, I wrote this just to:

  1. Play around. This is just a fun side project for now.
  2. Get faster derivatives for expressions. All I use SymPy for is derivatives of extremely large expressions containing trigs, exponents, sqrts, and abs. This is for solving multibody dynamics problems.
  3. Eventually port the half-finished pattern matcher I wrote for SymPy, so I can experiment with it in a language that doesn't penalize abstraction. The Python code currently works, but I want to play around with it in another high level language to see if I can improve it, before porting it back to Python.

I didn't intend for this to be the CAS for Julia. Maybe later, but I really like the way CSymPy is going, and think in the long run that's the best bet.

jcrist commented 9 years ago

I really wish julia had a switch construct, like in C, so that the long if statements could be compiled down to a computed goto, rather than consecutive checks. There is a Switch.jl package, but that just provides syntactic sugar, which still turns into if ... elseif ...,.

Upabjojr commented 9 years ago

This approach is what I originally was using, but switched due to performance issues. Multiple dispatch is great for composability, and can result in fast code when the concrete type of the object is known at compile time.

Oh, I see. Anyways, I think that's a Julia limitation, because if you define a type such as

type A{T}
    args::Array{A, 1}
end

all of its subtypes (e.g. A{:Add}, A{:Mul}) should technically require the same amount of memory, as their fields are the same. In CSymPy I see using dictionaries and arrays of Basic, despite their elements being subtypes of Basic.

jcrist commented 9 years ago

Yes, but because they are different types, then any methods being called on them may be different from element to element, and so the dispatch must happen at runtime, rather than compile time.

jcrist commented 9 years ago

Closing in favor of #2 .

Upabjojr commented 9 years ago

I think that is a Julia issue, even if they are different types, the memory access is the same. Julia's compiler fails to recognize that.

CSymPy uses subtypes without losing performance. Maybe C++ compilers are currently better than Julia's?

jcrist commented 9 years ago

It has nothing to do with the memory access. Fundamentally Foo{:A} and Foo{:B} are different types, even though they are both parametrics of Foo. So if I define a function func that does one thing to Foo{:A}, and another thing to Foo{:B} (even if the have the same memory layout), then the compiler has to do runtime type checking to perform a loop like:

for i in [Foo{:A}(...), Foo{:B}(...), Foo{:A}(...), Foo{:B}(...), ...]
    func(i)
end

because it has no knowledge of which func to use here. The type inference can't decide which func applies, so it has to dispatch at runtime. If the code is type-stable, then the proper function can be determined at compile time, and fast, native code generated.

Certainly the compiler for a new language like Julia isn't as good as those of C++, but there is a fundamental difference between parametric types in Julia and subclasses in C++. For more info: http://julia.readthedocs.org/en/latest/manual/types/#man-parametric-types

Upabjojr commented 9 years ago

I'm sorry, I meant field access. I have no computer access right now. The parametric types in my example are still slower when accessing a field. Field access in that mixed array may be made static, if Julia's compiler had the proper optimization, because all those different types have the same memory space. C++ introduces similar issues due to class inheritance, CSymPy uses class inheritance, Add and Mul are different types, but there is no such performance loss. I conclude that C++ compiler has better optimizations than Julia's.

I believe that once such problems are addressed in Julia, the approach of this issue will work well.

Concerning the switch, do you mean a binary tree or hash table lookup at assembly code instead of a nested list of ifs, or airthmetic operations to determine the jump size? I believe that a well optimized compiler should treat nested ifs and a switch statement the same way.

jcrist commented 9 years ago

Concerning the switch, do you mean a binary tree or hash table lookup at assembly code instead of a nested list of ifs, or airthmetic operations to determine the jump size? I believe that a well optimized compiler should treat nested ifs and a switch statement the same way.

Either or. There was some talk in a thread I read a while back that LLVM can perform this optimization, but looking at code_llvm and code_native shows it's still a string of if statements instead. I'm sure this will get better with time, but right now doing a long string of such comparisons results in non-optimal code :(.