oscarbenjamin / blog

Repo for blog posts
1 stars 0 forks source link

blog/czi/post2 #2

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

Towards a new SymPy: part 2 - Polynomials — blog documentation

https://oscarbenjamin.github.io/blog/czi/post2.html

VladimirJosephStephanOrlovsky commented 1 year ago

First: This 'implementations' has to be done long-long time ago. 1980's will be about correct time. ... so, presenting this now ... is ... 40+ years to late !! ... Nice!, but ... big time too late! Second: Identity 'verifications' ... whom you "afraid of" ??? ... or .... for whom do you collect this Info ???

panizza2 commented 1 year ago

Hi

Firstly, I would like to say doing a blog like this is fantastic! Now, is there any particular reasons for choosing to use a dictionary representation for sparse polynomials over something like a binary heap?

oscarbenjamin commented 1 year ago

any particular reasons for choosing to use a dictionary representation

I wasn't around when that decision was made. I suspect that it is because dicts are a basic feature in Python that is already provided, easy to use, and is already fairly well optimised. For CPython in particular the overhead of using a dict (i.e. hash table) rather than a list (array) is a lot lower in relative terms than it would be if you were working in something like C. So I'm not sure that using a heap would give similar benefits here but I also don't know that anyone has actually done any measurements to check that (the heapq module could be used to test this).

asmeurer commented 1 year ago

Thanks for the post Oscar. This one is a lot more technical than the previous one, but still useful information. I would suggest reaching out to @mattpap for questions on initial design decisions of the polys. He hasn't contributed to SymPy in a while but I bet he still remembers a lot of these things. As my memory serves me from my own discussions with him from years ago, I'm not completely sure some of your assumptions are actually correct (e.g., I don't know if the lowest level of the polys was ever intended to be implemented in C; as far as I know, they are just implemented that way because that is the fastest possible in pure Python).

One thing I would mention that isn't mentioned here is that the license of FLINT (and GMP for that matter) is the main reason why it is being considered as an optional dependency. If these libraries had BSD-compatible licenses, there would be no issue with making them hard dependencies of SymPy.

VladimirJosephStephanOrlovsky commented 1 year ago

Sorry, but I'm not sure what you are trying to say ..

On Mon, Sep 18, 2023 at 11:39 PM Aaron Meurer @.***> wrote:

Thanks for the post Oscar. This one is a lot more technical than the previous one, but still useful information. I would suggest reaching out to @mattpap https://github.com/mattpap for questions on initial design decisions of the polys. He hasn't contributed to SymPy in a while but I bet he still remembers a lot of these things. As my memory serves me from my own discussions with him from years ago, I'm not completely sure some of your assumptions are actually correct (e.g., I don't know if the lowest level of the polys was ever intended to be implemented in C; as far as I know, they are just implemented that way because that is the fastest possible in pure Python).

One thing I would mention that isn't mentioned here is that the license of FLINT (and GMP for that matter) is the main reason why it is being considered as an optional dependency. If these libraries had BSD-compatible licenses, there would be no issue with making them hard dependencies of SymPy.

— Reply to this email directly, view it on GitHub https://github.com/oscarbenjamin/blog/issues/2#issuecomment-1724913797, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKKM5RRPOECH6T334LJOAK3X3E42ZANCNFSM6AAAAAA4NS4SXI . You are receiving this because you commented.Message ID: @.***>

mattpap commented 1 year ago

I would suggest reaching out to @mattpap for questions on initial design decisions of the polys. He hasn't contributed to SymPy in a while but I bet he still remembers a lot of these things.

I haven't contributed in years, but I still observe the project a semi-regular basis and indeed I still remember quite well why things were done the way they were done. Thus if there are any questions regarding the design, I'm happy to give some feedback.

As my memory serves me from my own discussions with him from years ago, I'm not completely sure some of your assumptions are actually correct (e.g., I don't know if the lowest level of the polys was ever intended to be implemented in C; as far as I know, they are just implemented that way because that is the fastest possible in pure Python).

I didn't seriously intend to have it implemented in either C or C++, specifically given that at the time I was very much into SymPy being a pure Python library, but I have considered that and to some extent the design was influenced by that. However, it was mostly performance considerations in pure Python that led to this design. Polynomial manipulation is so ubiquitous in computer algebra, that even the smallest optimizations can have far reaching effects of everything else.

In retrospect I think this should have been implemented in a natively compiled programming language, because it's where true performance gains can be made. If I was to make this choice today, it would be an obvious one, though I would probably choose Rust for the implementation. However, with no wheels and conda, pure Python was a very enticing idea.

oscarbenjamin commented 1 year ago

Hi Mateusz,

I still remember quite well why things were done the way they were done. Thus if there are any questions regarding the design, I'm happy to give some feedback.

Excellent. If you have any thoughts on some of my suggestions above like MPoly etc then we could discuss them here or in a SymPy issue perhaps.

One question I have about the existing design is what the intended end-state was for the sparse polynomials. My experience with the implementation is that the dense polys are reasonable for univariate polynomials but scale up badly for multivariate polynomials. In the polys domains everything was switched to sparse polynomials but at the Poly level everything is dense polynomials.

Was the intention that eventually Poly would be switched to use sparse polynomials at every level (only sometimes switching out to dense for a particular algorithm)?

I haven't systematically profiled how the sparse polynomials would compare precisely against the dense univariate polynomials. I know that in timings with matrices though the benefits of a dense implementation over a sparse one are almost zero in CPython because dicts are almost faster than general interpreter overhead.

This is a quick comparison where I think that the algorithm is essentially the same in both cases:

In [52]: p = ZZ.old_poly_ring(x).from_sympy(x + 1)  # dense

In [53]: %time ok = p**100
CPU times: user 3.84 ms, sys: 0 ns, total: 3.84 ms
Wall time: 4.05 ms

In [54]: %time ok = p**1000
CPU times: user 92.6 ms, sys: 0 ns, total: 92.6 ms
Wall time: 90.8 ms

In [55]: p = ZZ[x](x + 1)  # sparse

In [56]: %time ok = p**100
CPU times: user 1.09 ms, sys: 175 µs, total: 1.27 ms
Wall time: 1.28 ms

In [57]: %time ok = p**1000
CPU times: user 9.87 ms, sys: 289 µs, total: 10.2 ms
Wall time: 9.8 ms

In this example it looks as if the sparse representation is faster even when the polynomials are both dense and univariate but I would have to look closer to see why that is.

nsrinivasan80 commented 1 year ago

Why don't you write an ebook. I will buy. These articles go away.

KZiemian commented 7 months ago

Great post.