sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 481 forks source link

implement van Hoeven's algorithm for relaxed multiplication of power series #34616

Open mantepse opened 2 years ago

mantepse commented 2 years ago

This is an experimental implementation of the algorithm presented in section 4.2. of van der Hoeven's "Relax but don't be too lazy".

CC: @tscrim @fchapoton

Component: combinatorics

Keywords: LazyPowerSeries

Author: Martin Rubey

Branch/Commit: u/mantepse/implement_van_hoeven_s_algorithm_for_relaxed_multiplication_of_power_series @ 1de1e6d

Issue created by migration from https://trac.sagemath.org/ticket/34616

mantepse commented 2 years ago

Branch: u/mantepse/implement_van_hoeven_s_algorithm_for_relaxed_multiplication_of_power_series

mantepse commented 2 years ago
comment:2

Unfortunately, the code does not work yet, because I have some trouble with some details in the article.


Last 10 new commits:

52ac7afpyflakes observations
06ec3d0Merge branch 'develop' of trac.sagemath.org:sage into t/32367/replace_lazy_power_series-32367
4172688Merge branch 'u/mantepse/implement_arithmetic_product_of_lazy_symmetric_functions' of trac.sagemath.org:sage into t/32367/replace_lazy_power_series-32367
41ca99bMerge branch 'u/mantepse/replace_lazy_power_series-32367' of trac.sagemath.org:sage into t/34470/categories_lazy_series-34470
fb3a5cdMerge branch 'u/mantepse/categories_lazy_series-34470' of trac.sagemath.org:sage into t/34552/lazy_series_test_suite-34552
855d2bfremove unused variable
75c275cadd documentation and doctests for _approximate_order
5393242fixes for pycodestyle and pyflakes
081d0e5Merge branch 'u/mantepse/lazy_series_test_suite-34552' of trac.sagemath.org:sage into t/34616/implement_van_hoeven_s_algorithm_for_relaxed_multiplication_of_power_series
87d9db1non-working first attempt
mantepse commented 2 years ago

Author: Martin Rubey

mantepse commented 2 years ago

Description changed:

--- 
+++ 
@@ -1 +1 @@
-
+This is an experimental implementation of the algorithm presented in section 4.2. of van der Hoeven's "Relax but don't be too lazy".
mantepse commented 2 years ago

Changed keywords from none to LazyPowerSeries

mantepse commented 2 years ago

Commit: 87d9db1

mantepse commented 2 years ago
comment:3

I think the mistake is that the caching is not implemented correctly. (References below are to van der Hoeven's paper)

Sec. 2.2., pg. 484 says that Series_Rep has an attribute n, which says up to which degree the values in the cache are already correct. However, The order of φ is allowed to exceed n in order to anticipate future computations. Moreover, the coefficients φ_0, . . . , φ_{k−1} must be computed before computing φ_k.

In Sec. 4.2.1., pg. 502, the definition of of DAC_Rep does not specify n initially, so it is probably meant to be 0. However, in Sec. 4.2.3. the definition of DAC_Rep does specify n := N/2, which the current branch does not take into account.

The implementation of van der Hoeven's algorithm should really use the dense version of streams. It probably makes sense to adapt the framework slightly.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

1de1e6dugly, but working
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Changed commit from 87d9db1 to 1de1e6d

mantepse commented 2 years ago
comment:5

Although the implementation is quite ugly in some details, I think we can learn enough for some decisions.

sage: from sage.data_structures.stream import (Stream_cauchy_mul, Stream_cauchy_mul_fast, Stream_function)
sage: f = Stream_function(lambda n: n, True, 0)
sage: g = Stream_function(lambda n: 1, True, 0)

sage: %time h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5); l1 = [h1[i] for i in range(270)]
CPU times: user 40.7 ms, sys: 0 ns, total: 40.7 ms
Wall time: 40.7 ms
sage: %time h2 = Stream_cauchy_mul(f, g); l2 = [h2[i] for i in range(270)]
CPU times: user 43.1 ms, sys: 0 ns, total: 43.1 ms
Wall time: 43.1 ms

sage: %time h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5); l1 = [h1[i] for i in range(2000)]
CPU times: user 712 ms, sys: 0 ns, total: 712 ms
Wall time: 712 ms
sage: %time h2 = Stream_cauchy_mul(f, g); l2 = [h2[i] for i in range(2000)]
CPU times: user 1.85 s, sys: 0 ns, total: 1.85 s
Wall time: 1.85 s
sage: %time h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5); l1 = h1[1000]
CPU times: user 233 ms, sys: 0 ns, total: 233 ms
Wall time: 233 ms
sage: %time h2 = Stream_cauchy_mul(f, g); l2 = h2[1000]
CPU times: user 2.43 ms, sys: 0 ns, total: 2.43 ms
Wall time: 2.44 ms
sage: from sage.data_structures.stream import Stream_function, Stream_cauchy_compose
sage: f = Stream_function(lambda n: n, True, 1)
sage: g = Stream_function(lambda n: n^2, True, 1)
sage: h = Stream_cauchy_compose(f, g)
sage: h[20]
289074264180
sage: [(len(x._cache), min(x._cache), max(x._cache)) for x in h._pos_powers[1:]]
[(20, 1, 20),
 (19, 2, 20),
 (18, 3, 20),
 (17, 4, 20),
 (16, 5, 20),
 (15, 6, 20),
 (14, 7, 20),
 (13, 8, 20),
 (12, 9, 20),
 (11, 10, 20),
 (10, 11, 20),
 (9, 12, 20),
 (8, 13, 20),
 (7, 14, 20),
 (6, 15, 20),
 (5, 16, 20),
 (4, 17, 20),
 (3, 18, 20),
 (2, 19, 20),
 (1, 20, 20)]
sage: from sage.data_structures.stream import (Stream_cauchy_mul, Stream_cauchy_mul_fast, Stream_function)
sage: f = Stream_function(lambda n: n, True, 0)
sage: g = Stream_function(lambda n: 1, True, 0)
sage: h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5)
sage: l1 = [h1[i] for i in range(10)]
sage: len(h1._h._phi)
64
sage: h1._h._phi[:15]
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 0, 0, 0, 0, 0]
sage: h1._cache
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45]
mantepse commented 2 years ago
comment:6

Somewhat orthogonal to the ticket: I think the decision whether a Stream_XXX class uses a dense or a sparse cache should not depend on the input streams.

For example, for Stream_add it is actually irrelevant whether the input streams left and right are dense or sparse.

For Stream_zero and Stream_exact, a dense cache never makes sense.

mantepse commented 2 years ago
comment:7

See #34636

tscrim commented 2 years ago
comment:8

Thank you for doing this.

+1 on using van der Hoeven's algorithm for the dense. We might want to add some more documentation to this to explain when one version might be preferred to the other.

mantepse commented 2 years ago
comment:9

Thank you for looking at it :-) - and more generally, all the reviews!

The thing I am somewhat stuck with here is the class hierarchy. Consider:

class Stream_cauchy_mul_DAC():
    def __init__(self, left, right, phi, N, threshold):
        self._left = [left[k] for k in range(N)]
        self._right = [right[k] for k in range(N)]
        if phi is None:
            self._phi = [0]*(2*N)
            self._lo = None
            self._n = ZZ.zero()
        else:
            # TODO: the first N/2 entries of self._phi are already
            # computed, the computation of the next N/2 is initiated
            # by the next line.  Could / should this be done lazily?
            self._phi = [phi[k] for k in range(N)] + [0]*N
            self._lo = phi
            self._n = ZZ(N / 2)

Currently, this class does not inherit from anything. However, it shares functionality with Stream_inexact by providing the cache. Moreover, self._phi[:n] is actually the same as the cache, except that we produce (recursively) many copies of this, via get_coefficient below.

class Stream_cauchy_mul_fast(Stream_binary):
    def __init__(self, left, right, threshold=2 ** 1):
        super().__init__(left, right, False)
        self._threshold = threshold
        self._h = Stream_cauchy_mul_DAC(left, right, None,
                                        self._threshold, self._threshold)

    def get_coefficient(self, n):
        if n >= self._threshold and is_power_of_two(n):
            self._h = Stream_cauchy_mul_DAC(self._left, self._right, self._h,
                                            2*n, self._threshold)
        return self._h[n]

In summary: to my eyes, this is a complete mess. I think it would be more beautiful if the caching mechanism provided by Stream_inexact would be reused, possibly slightly generalized.

I do not understand the algorithm well enough to see how much of self._phi could actually be shared. Ideally, I'd like a single cache (provided by Stream_inexact or Stream_cauchy_mul) which is manipulated by Stream_cauchy_mul_DAC.

tscrim commented 2 years ago
comment:10

Is there a reason why these need to be two separate classes? It seems like you really just want to have one class that inherits from Stream_binary. I don't quite understand the obstruction to this from a quick look.

mantepse commented 2 years ago
comment:11

Stream_cauchy_mul_fast._h has it's own state (i.e., _phi). In fact, this is precisely the question: can the_phi attributes of the various Stream_cauchy_mul_DAC instances share memory?

(I should have documented: DAC is for divide and conquer)

tscrim commented 2 years ago
comment:12

It is certainly possible by passing the same list around and mutating it. However, each instance would just have to know which block is its responsibility, which creates a bit more complicated code structure and is likely harder to debug. How often are lists needed to be (re)constructed or are short-lived transient objects?