Open mantepse opened 2 years ago
Unfortunately, the code does not work yet, because I have some trouble with some details in the article.
Last 10 new commits:
52ac7af | pyflakes observations |
06ec3d0 | Merge branch 'develop' of trac.sagemath.org:sage into t/32367/replace_lazy_power_series-32367 |
4172688 | Merge branch 'u/mantepse/implement_arithmetic_product_of_lazy_symmetric_functions' of trac.sagemath.org:sage into t/32367/replace_lazy_power_series-32367 |
41ca99b | Merge branch 'u/mantepse/replace_lazy_power_series-32367' of trac.sagemath.org:sage into t/34470/categories_lazy_series-34470 |
fb3a5cd | Merge branch 'u/mantepse/categories_lazy_series-34470' of trac.sagemath.org:sage into t/34552/lazy_series_test_suite-34552 |
855d2bf | remove unused variable |
75c275c | add documentation and doctests for _approximate_order |
5393242 | fixes for pycodestyle and pyflakes |
081d0e5 | Merge branch 'u/mantepse/lazy_series_test_suite-34552' of trac.sagemath.org:sage into t/34616/implement_van_hoeven_s_algorithm_for_relaxed_multiplication_of_power_series |
87d9db1 | non-working first attempt |
Author: Martin Rubey
Description changed:
---
+++
@@ -1 +1 @@
-
+This is an experimental implementation of the algorithm presented in section 4.2. of van der Hoeven's "Relax but don't be too lazy".
Changed keywords from none to LazyPowerSeries
I think the mistake is that the caching is not implemented correctly. (References below are to van der Hoeven's paper)
Sec. 2.2., pg. 484 says that Series_Rep
has an attribute n
, which says up to which degree the values in the cache are already correct. However, The order of φ is allowed to exceed n in order to anticipate future computations. Moreover, the coefficients φ_0, . . . , φ_{k−1}
must be computed before computing φ_k
.
In Sec. 4.2.1., pg. 502, the definition of of DAC_Rep
does not specify n
initially, so it is probably meant to be 0
. However, in Sec. 4.2.3. the definition of DAC_Rep
does specify n := N/2
, which the current branch does not take into account.
The implementation of van der Hoeven's algorithm should really use the dense version of streams. It probably makes sense to adapt the framework slightly.
Branch pushed to git repo; I updated commit sha1. New commits:
1de1e6d | ugly, but working |
Although the implementation is quite ugly in some details, I think we can learn enough for some decisions.
sage: from sage.data_structures.stream import (Stream_cauchy_mul, Stream_cauchy_mul_fast, Stream_function)
sage: f = Stream_function(lambda n: n, True, 0)
sage: g = Stream_function(lambda n: 1, True, 0)
sage: %time h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5); l1 = [h1[i] for i in range(270)]
CPU times: user 40.7 ms, sys: 0 ns, total: 40.7 ms
Wall time: 40.7 ms
sage: %time h2 = Stream_cauchy_mul(f, g); l2 = [h2[i] for i in range(270)]
CPU times: user 43.1 ms, sys: 0 ns, total: 43.1 ms
Wall time: 43.1 ms
sage: %time h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5); l1 = [h1[i] for i in range(2000)]
CPU times: user 712 ms, sys: 0 ns, total: 712 ms
Wall time: 712 ms
sage: %time h2 = Stream_cauchy_mul(f, g); l2 = [h2[i] for i in range(2000)]
CPU times: user 1.85 s, sys: 0 ns, total: 1.85 s
Wall time: 1.85 s
sage: %time h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5); l1 = h1[1000]
CPU times: user 233 ms, sys: 0 ns, total: 233 ms
Wall time: 233 ms
sage: %time h2 = Stream_cauchy_mul(f, g); l2 = h2[1000]
CPU times: user 2.43 ms, sys: 0 ns, total: 2.43 ms
Wall time: 2.44 ms
sage: from sage.data_structures.stream import Stream_function, Stream_cauchy_compose
sage: f = Stream_function(lambda n: n, True, 1)
sage: g = Stream_function(lambda n: n^2, True, 1)
sage: h = Stream_cauchy_compose(f, g)
sage: h[20]
289074264180
sage: [(len(x._cache), min(x._cache), max(x._cache)) for x in h._pos_powers[1:]]
[(20, 1, 20),
(19, 2, 20),
(18, 3, 20),
(17, 4, 20),
(16, 5, 20),
(15, 6, 20),
(14, 7, 20),
(13, 8, 20),
(12, 9, 20),
(11, 10, 20),
(10, 11, 20),
(9, 12, 20),
(8, 13, 20),
(7, 14, 20),
(6, 15, 20),
(5, 16, 20),
(4, 17, 20),
(3, 18, 20),
(2, 19, 20),
(1, 20, 20)]
sage: from sage.data_structures.stream import (Stream_cauchy_mul, Stream_cauchy_mul_fast, Stream_function)
sage: f = Stream_function(lambda n: n, True, 0)
sage: g = Stream_function(lambda n: 1, True, 0)
sage: h1 = Stream_cauchy_mul_fast(f, g, threshold=2^5)
sage: l1 = [h1[i] for i in range(10)]
sage: len(h1._h._phi)
64
sage: h1._h._phi[:15]
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 0, 0, 0, 0, 0]
sage: h1._cache
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45]
Somewhat orthogonal to the ticket: I think the decision whether a Stream_XXX
class uses a dense or a sparse cache should not depend on the input streams.
For example, for Stream_add
it is actually irrelevant whether the input streams left
and right
are dense or sparse.
For Stream_zero
and Stream_exact
, a dense cache never makes sense.
See #34636
Thank you for doing this.
+1 on using van der Hoeven's algorithm for the dense. We might want to add some more documentation to this to explain when one version might be preferred to the other.
Thank you for looking at it :-) - and more generally, all the reviews!
The thing I am somewhat stuck with here is the class hierarchy. Consider:
class Stream_cauchy_mul_DAC():
def __init__(self, left, right, phi, N, threshold):
self._left = [left[k] for k in range(N)]
self._right = [right[k] for k in range(N)]
if phi is None:
self._phi = [0]*(2*N)
self._lo = None
self._n = ZZ.zero()
else:
# TODO: the first N/2 entries of self._phi are already
# computed, the computation of the next N/2 is initiated
# by the next line. Could / should this be done lazily?
self._phi = [phi[k] for k in range(N)] + [0]*N
self._lo = phi
self._n = ZZ(N / 2)
Currently, this class does not inherit from anything. However, it shares functionality with Stream_inexact
by providing the cache. Moreover, self._phi[:n]
is actually the same as the cache, except that we produce (recursively) many copies of this, via get_coefficient
below.
class Stream_cauchy_mul_fast(Stream_binary):
def __init__(self, left, right, threshold=2 ** 1):
super().__init__(left, right, False)
self._threshold = threshold
self._h = Stream_cauchy_mul_DAC(left, right, None,
self._threshold, self._threshold)
def get_coefficient(self, n):
if n >= self._threshold and is_power_of_two(n):
self._h = Stream_cauchy_mul_DAC(self._left, self._right, self._h,
2*n, self._threshold)
return self._h[n]
In summary: to my eyes, this is a complete mess. I think it would be more beautiful if the caching mechanism provided by Stream_inexact
would be reused, possibly slightly generalized.
I do not understand the algorithm well enough to see how much of self._phi
could actually be shared. Ideally, I'd like a single cache (provided by Stream_inexact
or Stream_cauchy_mul
) which is manipulated by Stream_cauchy_mul_DAC
.
Is there a reason why these need to be two separate classes? It seems like you really just want to have one class that inherits from Stream_binary
. I don't quite understand the obstruction to this from a quick look.
Stream_cauchy_mul_fast._h
has it's own state (i.e., _phi
). In fact, this is precisely the question: can the_phi
attributes of the various Stream_cauchy_mul_DAC
instances share memory?
(I should have documented: DAC is for divide and conquer)
It is certainly possible by passing the same list around and mutating it. However, each instance would just have to know which block is its responsibility, which creates a bit more complicated code structure and is likely harder to debug. How often are lists needed to be (re)constructed or are short-lived transient objects?
This is an experimental implementation of the algorithm presented in section 4.2. of van der Hoeven's "Relax but don't be too lazy".
CC: @tscrim @fchapoton
Component: combinatorics
Keywords: LazyPowerSeries
Author: Martin Rubey
Branch/Commit: u/mantepse/implement_van_hoeven_s_algorithm_for_relaxed_multiplication_of_power_series @
1de1e6d
Issue created by migration from https://trac.sagemath.org/ticket/34616