Combine efforts with TaylorSeries.jl

jwmerrill / PowerSeries.jl

Truncated Power Series for Julia

MIT License

11 stars 7 forks source link

Combine efforts with TaylorSeries.jl #7

Open jwmerrill opened 10 years ago

jwmerrill commented 10 years ago

TaylorSeries.jl seems to have very similar scope to this library.

The major differences that I can see are:

TaylorSeries uses an array for the underlying storage, whereas PowerSeries uses a type tower of immutable fixed-order types
TaylorSeries multiplication takes series of order N and M to a series of order ~~N + M~~ max(N, M), whereas PowerSeries only defines multiplication for series of the same order, and truncates the result to the order of the inputs (edit, the libraries are more similar here than I thought)
TaylorSeries uses L'Hopital's rule to allow forming the quotient of series with the same number of lowest-order zero terms
TaylorSeries uses different explicit loops to generate coefficients of different functions applied to series, whereas PowerSeries typically uses recursive application of the fundamental theorem of calculus, along with derivative definitions.
TaylorSeries' constructor takes an array argument and an order argument, whereas PowerSeries' constructor takes varargs specifying coefficients.

These differences have various performance, api convenience, and implementation complexity implications. If we can come to a "best" choice along most of these axes, I think it will be best in the long term to combine these libraries, so that every user doesn't have to spend time researching the differences and relative advantages of the two libraries.

jwmerrill commented 10 years ago

From the mailing list:

On Wednesday, March 26, 2014 8:10:48 PM UTC-7, Luis Benet wrote: We use automatic differentiation to obtain f( p(x) ) as a polynomial series, with p(x) is a given polynomial. This means using certain recurrence relations to calculate the coefficients of the expansion.

PowerSeries is using automatic differentiation too. The motivation for me was to try to extend automatic differentiation to higher orders. I use the fundamental theorem of calculus to generate the relevant recurrence relations on the fly from the definition of first derivatives of functions.

The advantage of this approach is that it is very easy to add new functions, since it is straightforward to define the power series given the first derivative definition. A disadvantage is that there are special algorithms for many functions that allow faster generation of high order coefficients. See, e.g. Fast Algorithms for Manipulating Formal Power Series.

I made a decision to focus on implementation simplicity and fast computations with low order series instead of focusing on asymptotically fast algorithms. My contention is that for each order you add to the derivative, there is an order of magnitude fewer relevant application.

jwmerrill commented 10 years ago

Regarding benchmarks, I would like to produce tables of execution speed vs series order and nesting depth for different functions using PowerSeries, TaylorSeries, and the symbolic derivatives produced by Calculus.jl

Concretely, how long does it take to compute sin(series(0.0, 1.0, 0.0, 0.0, ...))) as the order increases, and how long does it take to compute sin(sin(sin(...sin(series(0.0, 1.0, 0.0))))) as the number of function applications increases.

jwmerrill commented 10 years ago

Re: point 2 above, the resulting order after multiplication of series, from my POV it is critical to control the order of a series over the course of a long computation, because multiplication is O(N*M) in the order of two series, and many functions of a series are O(N^3).

Floating point numbers are so useful because they truncate to a fixed (relative) precision after each computation, which means that operating on a value produced by a long computation takes exactly as much effort as the same operation on a value at the start of a computation. You want the same property when computing with truncated power series, especially in an automatic differentiation context, where you are plugging series into computations that were originally designed to work over floats in order to extract derivative information.

I chose to require both inputs to binary operations to have the same order because it's nice to know that when you get a series of order N out of a computation, all of the series coefficients are accurate. If you feed in inputs with different orders smaller than N, the accuracy of high order terms is hard to reason about, and depends on the order in which various mathematically associative and commutative reductions happens.

lbenet commented 10 years ago

I agree that we should join efforts, so I am cc @dpsanders who is also involved in the development of TaylorSeries, among other projects.

With respect to your observations on TaylorSeries.jl, I should add a minor correction: TaylorSeries multiplication indeed takes two series of order N and M; the result is a truncated series of order max(N,M), and not N+M. So, we actually control de order of the polynomial resulting from the multiplication. The actual algorithm is not optimised for large polynomial orders; it is a simple convolution.

This is actually a design decision. One of the uses that we have in mind for TaylorSeries is to allow make integrations of smooth enough ODEs (so non-stiff) using Taylor's method, which can already be done. The big advantage of this integration method is the precision you can reach (for large enough order of the polynomials, say 24); yet, the jet routine which implements the equations of motion has to be "hand-crafted" for each different problem. (I would love to be able to create a "parser" of the equations of motion into a routine/function for the jet; but I am not such an experienced julia programer to know if this is simple to do.)

Regarding the benchmarks, my naïve guess is that TaylorSeries doesn't perform too good if the order of the polynomial is changed often; in that sense, it is better to fix the order (to order 28 if you wish) from the beginning.

cc @dpsanders

lbenet commented 10 years ago

Is the following what you have in mind?

julia> a = Taylor([0.0, 1.0], 30); b = sin(a); @time begin 
              for i = 1:100
                for j= 1:30
                  a = Taylor([0.0, 1.0], j);
                  b = sin(a);
                end
              end
              b
            end
elapsed time: 0.016819646 seconds (7848000 bytes allocated)
Taylor{Float64}([0.0,1.0,0.0,-0.166667,0.0,0.00833333,0.0,-0.000198413,0.0,2.75573e-6  …  1.95729e-20,0.0,-3.86817e-23,0.0,6.44695e-26,0.0,-9.18369e-29,0.0,1.131e-31,0.0],30)

julia> a = Taylor([0.0, 1.0], 30); b = sin(a); @time begin 
       for i = 1:100
         b = sin(a);
         for j= 1:30
           b = sin(b);
         end
       end
       b
       end
elapsed time: 0.033350918 seconds (13044800 bytes allocated)
Taylor{Float64}([0.0,1.0,0.0,-5.16667,0.0,39.0083,0.0,-323.784,0.0,2802.41  …  1.55805e9,0.0,-1.43791e10,0.0,1.32974e11,0.0,-1.23166e12,0.0,1.1422e13,0.0],30)

I'm not an expert for benchmarking with Julia...

jwmerrill commented 10 years ago

I worked on the benchmarking a bit this afternoon. Here's a session looking at execution time vs. iteration depth for sin(sin(sin(...sin(x)))) where x is either a 1st order PowerSeries or TaylorSeries:

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" to list help topics
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.3.0-prerelease+2224 (2014-03-30 00:26 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 32c072e (1 day old master)
|__/                   |  x86_64-apple-darwin13.1.0

julia> using Winston

julia> using PowerSeries

julia> using TaylorSeries

julia> function iterated_sin_taylor(depth)
         accum = 0.0
         for i = 1:1000
           x = Taylor([1.0, 1.0])
           for j = 1:depth
               x = sin(x)
           end
           accum += x.coeffs[2]
         end
         return accum
       end

julia> function iterated_sin_series(depth)
         accum = 0.0
         for i = 1:1000
           x = series(1.0, 1.0)
           for j = 1:depth
               x = sin(x)
           end
           accum += polyder(x)
         end
         return accum
       end

julia> iterated_sin_taylor(10)
79.2930521907721

julia> iterated_sin_series(10)
79.2930521907721

julia> @time iterated_sin_taylor(1)
elapsed time: 0.0019593 seconds (622716 bytes allocated)
540.3023058681374

julia> @time iterated_sin_series(1)
elapsed time: 8.3798e-5 seconds (64 bytes allocated)
540.3023058681374

julia> @time iterated_sin_taylor(30)
elapsed time: 0.053925713 seconds (12216064 bytes allocated)
19.894937325597006

julia> @time iterated_sin_series(30)
elapsed time: 0.001427789 seconds (64 bytes allocated)
19.894937325597006

julia> display(loglog(1:30, [@elapsed iterated_sin_series(i) for i = 1:30], "b", 1:30, [@elapsed iterated_sin_taylor(i) for i = 1:30], "g"))
FramedPlot(...)

julia> xlabel("Iteration Depth")
FramedPlot(...)

julia> p = ylabel("Execution Time")
FramedPlot(...)

julia> display(p)
FramedPlot(...)

PowerSeries is blue, and TaylorSeries is green, so in this case, it looks like PowerSeries is consistently something like a factor of 50 faster.

I think the spikes are probably garbage collection pauses. TaylorSeries does a lot more allocation because its data is stored in arrays, and at each step of the computation, a new array must be allocated.

As you look at the benchmark code, note that the accum thing is just about making sure the loop does work that the compiler won't optimize away.

jwmerrill commented 10 years ago

I'd still like to do a benchmark that varies series order instead of computation tree depth, but didn't get to that today. I'd also like to compare both of these to the symbolic derivatives from Calculus2.

It's actually a little bit hard to vary the order of a PowerSeries series programmatically in a loop in a way that is type-inference friendly. I'm still trying to figure out what to do about that.

lbenet commented 10 years ago

I agree that your benchmarks show that TaylorSeries spends too much time allocating things in memory. The positive thing of using arrays is precisely that it allows an easy way to change the order of the polynomials. I'd love to optimise this aspect in TaylorSeries.

Are tuples the way to go?

jwmerrill commented 10 years ago

Perhaps. I haven't experimented with that, but it's certainly worth investigating.

On Mon, Mar 31, 2014 at 12:41 PM, Luis Benet notifications@github.comwrote:

I agree that your benchmarks show that TaylorSeries spends too much time allocating things in memory. The positive thing of using arrays is precisely that it allows an easy way to change the order of the polynomials. I'd love to optimise this aspect in TaylorSeries.

Are tuples the way to go?

Reply to this email directly or view it on GitHubhttps://github.com/jwmerrill/PowerSeries.jl/issues/7#issuecomment-39131890 .

vchuravy commented 10 years ago

With the new improvements from JuliaLang/julia#5355 and JuliaLang/julia#6271 tuple seem to be definitely the way to go.

jwmerrill commented 10 years ago

@lbenet

This is actually a design decision. One of the uses that we have in mind for TaylorSeries is to allow make integrations of smooth enough ODEs (so non-stiff) using Taylor's method, which can already be done.

You should also take a look at ApproxFun.jl if you haven't already. This package implements ODE solving with polynomials, but uses the Chebyshev basis instead of the monomial basis. It is also capable of "spectral accuracy," and Chebyshev interpolants/polynomials have many practical advantages over monomial-basis polynomials.

So far, that package is pretty light on the docs, but it sounds like it's being actively developed, and its author is an expert in the field.

lbenet commented 10 years ago

I have been able to improve the performance of TaylorSeries by a factor ~4, simply changing Integer by Int in a bunch of declarations, as well as some other minor things. I imagine there are other improvements like these, but the seemingly essential point, to avoid using Arrays in favor of Tuples, is still something I haven't quite managed, nor the suggestion related with JuliaLang/julia#5355 and/or JuliaLang/julia#6271...

Yet, while trying to understand the designing of PowerSeries, I noticed that PowerSeries generates a bunch of immutable types of the form SeriesN, whose inner structure are numbers (of type T<:Real) of the form c0, c1, ..., cN (the default for N is 7). The nice thing of this, which somehow mimics the use of Tuples, is that for each type SeriesN the number of coefficients is fixed, which cannot be achieved (so efficiently) when using Array{T,1}.

Yet, increasing the order of the polynomials may lead to some performance issues, simply because the number of coefficients increases ~ N^2. With the improvements described above, I repeated the same type of benchmarks, going up to order 29, and got results which marginally favor TaylorSeries:

julia> using PowerSeries

julia> PowerSeries.generate(30)  ## this allows to create series of order up to 30

julia> using TaylorSeries

julia> function iterated_sin_taylor(depth)
           accum = 0.0
           for i = 1:1000
               x = Taylor( ones(Float64,30) )   ## order 29 series!
               for j = 1:depth
                   x = sin(x)
               end
               accum += x.coeffs[2]
           end
           return accum
       end
iterated_sin_taylor (generic function with 1 method)

julia> function iterated_sin_series(depth)
           accum = 0.0
           for i = 1:1000
               x = series( ones(Float64,30)... ) ## order 29 series
               for j = 1:depth
                   x = sin(x)
               end
               accum += polyder(x)
           end
           return accum
       end
iterated_sin_series (generic function with 1 method)

julia> iterated_sin_taylor(30)
19.894937325597006

julia> iterated_sin_series(30)
19.894937325597006

julia> @time iterated_sin_taylor(30)
elapsed time: 0.127158669 seconds (36168064 bytes allocated)
19.894937325597006

julia> @time iterated_sin_series(30)
elapsed time: 0.250235761 seconds (12472064 bytes allocated)
19.894937325597006

Note the increase in memory of PowerSeries, which becomes now comparable to the memory used by TaylorSeries.

jwmerrill commented 10 years ago

Thanks for looking into this. I'm not too surprised that PowerSeries behaves poorly at high order. For one thing, it will generate an awful lot of code. Multiplication of series is unrolled completely, so the length of the generated code is O(n^2). Most of the functions (like sin) have O(n^3) execution time.

I've been reading up a bunch about NTuples in the last couple of days. If I could use them to collapsed that hierarchy of named types back down to a single parametrized Series{T, N} without losing performance, that would be fantastic.

For the current implementation, I copied a lot of the code generation ideas from ImmutableArrays, which has a similar named type hierarchy for similar reasons. The generate_types code was pretty annoying to write, but I was happy that after defining the primitive operations that way (+, -, *, /, polyint, polyder, restrict), that it was very easy to implement all the other functions.

lbenet commented 10 years ago

I have played a bit with @simd and @inbounds; using the latter alone I get improvements by a factor around 2.

Anyway, coming back to benchmarking the packages, these are my results (note the change in the functions iterated_sin_series and iterated_sin_taylor to accomodate dealing with higher order polynomials with a parameter):

julia> using PowerSeries
julia> PowerSeries.generate(30)
julia> using TaylorSeries
julia> function iterated_sin_taylor(depth, n)
           accum = 0.0
           for i = 1:1000
               #x = Taylor([1.0, 1.0])
               x = Taylor(ones(Float64,n))
               for j = 1:depth
                   x = sin(x)
               end
               accum += x.coeffs[2]
           end
           return accum
       end
iterated_sin_taylor (generic function with 1 method)

julia> function iterated_sin_series(depth, n)
           accum = 0.0
           for i = 1:1000
               #x = series(1.0, 1.0)
               x = series( ones(Float64,n)... )
               for j = 1:depth
                   x = sin(x)
               end
               accum += polyder(x)
           end
           return accum.c0
       end
iterated_sin_series (generic function with 1 method)

julia> iterated_sin_taylor(10,3)
79.2930521907721

julia> iterated_sin_series(10,3)
79.2930521907721

julia> iterated_sin_taylor(10,30)
79.2930521907721

julia> iterated_sin_series(10,30)
79.2930521907721

julia> res_ser1 = [@elapsed iterated_sin_series(i,3) for i = 1:30];

julia> res_taylor1 = [@elapsed iterated_sin_taylor(i,3) for i = 1:30];

julia> res_ser2 = [@elapsed iterated_sin_series(i,30) for i = 1:30];

julia> res_taylor2 = [@elapsed iterated_sin_taylor(i,30) for i = 1:30];

julia> function plot_comparison( rr, res_ser, res_tay, tit)
           figure();
           plot( log10(rr), log10(res_ser), color="blue"); # iterated_sin_series
           plot( log10(rr), log10(res_tay), color="green" ); #iterated_sin_taylor
           xlabel("Iteration depth");
           ylabel("Execution time")
           title(tit)
       end
plot_comparison (generic function with 1 method)

julia> fig1 = plot_comparison(1:30, res_ser1, res_taylor1, "res_ser1 (blue) vs res_taylor1 (green)");

julia> fig2 = plot_comparison(1:30, res_ser2, res_taylor2, "res_ser2 (blue) vs res_taylor2 (green)");

unknown

unknown-1

I think one can summarise these results as follows:

PowerSeries performs better for low-order polynomials, while TaylorSeries does for higher order expansions.
Comparing both packages yields a factor ~6 in favor of PowerSeries for low order polynomials, and a factor ~4 in favor of TaylorSeries for higher orders (I took the last point in the plots).
It is worth noting that PowerSeries's execution times vary a lot with respect to different orders of the polynomials, a factor ~150 when comparing the last points using order 2 or 29, while TaylorSeries's yields a factor ~6 for the corresponding values.