oscar-system / Oscar.jl

A comprehensive open source computer algebra system for computations in algebra, geometry, and number theory.
https://www.oscar-system.org
Other
362 stars 129 forks source link

Possible performance issue in MPolyAnyMap / ring flattening #4112

Open fingolfin opened 2 months ago

fingolfin commented 2 months ago

During the "Julia performance tuning crash course" I did this week's OSCAR workshop, we looked at some of the "big" test suites in our tests, those which use 60GB of RAM and more, and found various sources of RAM usage.

One that was prominent across the board was the evaluate function for MPolyAnyMap, e.g. used for ring flattening.

The problem: it seems that images are constructed using + and ^. So if you map e.g. x^2*y + 5*y^3 to an isomorphic polynomial ring, then it gradually builds x^2, x^2*y, y^3, 5*y^3, x^2*y + 5*y^3 i.e. a ton of intermediate objects.

Of course in general, this is the best we can do.

But here the codomain is a polynomial ring.

There are multiple ways I can think of to address this, not necessarily mutually exclusive. e.g.

  1. We could perhaps add a special case to MPolyAnyMap for when the codomain is an MPolyRing? Ideally it would then use (the equivalent of) an MPolyBuildCtx.

  2. Perhaps we need additional kinds of maps that are optimized for certain setups?

  3. Perhaps the ring flattening code should simply not use MPolyAnyMap in the first place, but do its job differently?

  4. And perhaps we are not using MPolyAnyMap optimally in some cases?

  5. ... something else?

Here is a very simple way how R = k[x...], S = R[y...] could be mapped more efficiently to U = k[x...,y...]: consider a term c * Y in S, where c in R and Y is a monomial y^e with e=(e1,...,en). To map it, we can iterate over the terms in c which have the form d * x^f for some d in k and exponent vector f. For each of these terms, we just need to push (c, vcat(f,e)) into the MPolyBuildCtx.

Of course this only works if we map variables to variable, and if the variables are arranged in just this way. But it is my understanding that this is what we have in ring flattening situations anyway.

The process can also be adjusted to more than two "levels" (i.e. k[x...][y...][z...]) if necessary.

To go beyond MPolyRing yet more thoughts are required, but I am confident we can do something.

thofma commented 2 months ago

Just a few comments:

  1. Since MPolyRingAny is just a fancy wrapper for evaluate, the problem should lie there. Indeed it seems to be the case that evaluate(f::MPolyRingElem, values::Vector) does not do clever things in case the target is a polynomial ring itself. Interestingly, partial evaluation evaluate(f::MPolyRingElem, variables::Vector{Int}, values::Vector) is optimized. It might be just a matter of fixing this.
  2. I agree that for the best performance, RingFlattening should role its own evaluate.

Can you provide us with a simple example? I wonder how much one gains by fixing 1.

fingolfin commented 2 months ago

This cropped up on Wednesday morning when analyzing some of the big examples in our test suite. I will try to reproduce it and post details.

It may have been during one of the testsets of

@ test/AlgebraicGeometry/Schemes/elliptic_surface.jl:1 @
# runs 480 seconds, allocates 122 GB

By the way, a related issue affects the literature models (@HereAround is already looking into this, but is hampered by a broken laptop)

@ experimental/FTheoryTools/test/literature_models.jl:1 @
# runs 400 seconds, allocates 80 GB

@ experimental/FTheoryTools/test/tate_models.jl:1 @
# runs 215 seconds, allocates 65 GB

Here polynomials are loaded by parsing them via Meta.parse and then "evaluating" the resulting syntax tree. I think evaluate also shows up here, so this code would likely also benefit from an improved evaluate. But I think the real fix will be to not call evaluate at all. Perhaps it can switch to the serialization by @antonydellavecchia?

fieker commented 2 weeks ago

@HechtiDerLachs can you comment how bad this still is?

HechtiDerLachs commented 1 week ago

We have some timings in #4124. It has improved, that is.

HereAround commented 6 days ago

@HechtiDerLachs Are you happy with the performance improvements or is further work needed?

HechtiDerLachs commented 13 hours ago

I am happy with the improvements from my side. But I did not raise the issue originally, so it is probably not for me to decide about closing it.

fingolfin commented 12 hours ago

It should only be closed if someone uses the profiling tools I showed you to verify this is indeed no longer the bottleneck.

Last time I looked at it there was still serious room for improvement (basically, we would be copying/concatenating exponent vectors and only perform arithmetic in the "bottom most base ring" in an ideal world). But perhaps I missed some additional optimization PRs?

Ideally someone from the geometry group would set up some benchmarks for this and e.g. compare OSCAR 1.1 performance to master, and also check if the number of allocations seems reasonable now or not.