oscar-system / Oscar.jl

A comprehensive open source computer algebra system for computations in algebra, geometry, and number theory.
https://www.oscar-system.org
Other
339 stars 120 forks source link

is_zero is slow for quotient rings #2099

Closed HechtiDerLachs closed 1 year ago

HechtiDerLachs commented 1 year ago

I was observing that is_zero is rather slow for quotient rings. For instance, try the following code:

R, (x,y,z) = QQ["x", "y", "z"]

I = ideal(R, [x^4 + y*z^7 - 35*x^3*z, x^29 - 15 + z- y*z^3])

Q, _ = quo(R, I)

a = x
b = Q(x)

@time for i in 1:1000;  iszero(a); end
@time for i in 1:1000;  iszero(b); end

I get 0.000083 seconds vs. 0.116045 seconds. Moreover, an analysis with @profview indicates that for every call to is_zero, a groebner basis for I is looked up in a dictionary. This lookup seems to first create an integer matrix for the given ordering and then these matrices are compared. In various other instances (which I can unfortunately not reproduce here) this has lead to significant time consumption in my computations (At least it showed up to consume much time in the ProfileView analysis). My general feeling is that this comes to bear when we need to check that many polynomials are zero modulo an ideal, for instance when dealing with matrices over quotient rings or polynomial rings over quotient rings.

Any ideas on improvements for this?

thofma commented 1 year ago

Can you quantify what "much time" means for the lookup? 99 percent? 50 percent? 10 percent?

HechtiDerLachs commented 1 year ago

Often it looks close to 95% of the time consumed for the is_zero test.

Edit: On my machine, this can even be observed for the above example when using @profview.

thofma commented 1 year ago

I see various (easy) ways to improve it. I will open a PR soonish.

thofma commented 1 year ago

Do you have a non-trivial example (but which does not run for hours)?

fingolfin commented 1 year ago

A side remark for @HechtiDerLachs: I recommend using the BenchmarkTools package for doing such timings. So instead of applying @time to a loop, you can do this:

julia> using BenchmarkTools

julia> @btime iszero(a);
  41.961 ns (0 allocations: 0 bytes)

julia> @btime iszero(b);
  40.708 μs (675 allocations: 20.06 KiB)

So that shows a factor 1000 in performance, and a lot of extra allocations.

BTW you can use the @benchmark macro to get a super nifty visualization of the distribution of the timings for several runs.

Now, I thought that surely a in I (surely equivalent to is_zero(b)) might be faster, but to my surprise it is even worse:

julia> @btime a in I
  53.792 μs (929 allocations: 28.02 KiB)
false

So what does is_zero(b) do? This:

iszero(x) = x == zero(x) # fallback method

Tracing down, a significant part of the time is spent in simplify:

julia> zz = zero(b)
0

julia> b == zz
false

julia> @btime simplify(b)
  22.667 μs (366 allocations: 10.55 KiB)
x

julia> @btime simplify(zz)
  17.667 μs (307 allocations: 9.41 KiB)
0

So if those elements would simply store the information that they are already simplified (or heck, if we just added a check that says "if the underlying element is zero, then it definitely is simplified"), this would probably become a lot faster. It doesn't help that the simplify method in this case is not type stable.

So back to the a in I test, which really invokes this function:

function ideal_membership(f::T, I::MPolyIdeal{T}; ordering::MonomialOrdering = default_ordering(base_ring(I))) where T
  GI = standard_basis(I, ordering=ordering, complete_reduction=false)
  singular_assure(GI)
  Sx = base_ring(GI.S)
  return Singular.iszero(Singular.reduce(Sx(f), GI.S))
end
Base.:in(f::MPolyRingElem, I::MPolyIdeal) = ideal_membership(f,I)

Why do we specify a term ordering for ideal_membership??? Let's just grab the first possible GB instead:

function my_ideal_membership(f::T, I::MPolyIdeal{T}) where T
  if isempty(I.gb)
    GI = standard_basis(I, ordering=default_ordering(base_ring(I)), complete_reduction=false)
  else
    GI = first(I.gb)[2]
  end
  singular_assure(GI)
  Sx = base_ring(GI.S)
  return Singular.iszero(Singular.reduce(Sx(f), GI.S))
end

then I get these numbers:

julia> @btime ideal_membership(a, I);
  54.000 μs (929 allocations: 28.02 KiB)

julia> @btime my_ideal_membership(a, I);
  2.157 μs (30 allocations: 688 bytes)

So that's a factor 25 faster, MUCH better. But we are still slower by a factor 40. I am sure it could be brought down more.

But I also worry that there is a lot more code with problems of this kind in there... Hrm

HechtiDerLachs commented 1 year ago

Do you have a non-trivial example (but which does not run for hours)?

I was trying to produce one. But I failed. I will let you know if this changes. Until then, I guess the best I could do is to use the above example with more iterations.

HechtiDerLachs commented 1 year ago

So that's a factor 25 faster, MUCH better.

That is more or less what I was expecting. A lot of time seems to have been spent on the lookup of the Groebner basis in the dictionary, usually.

fingolfin commented 1 year ago

Yeah, we spend a lot of effort to build a fresh ordering. Then @assert is_global(ordering) actually is slow. And then we need to hash that ordering and look it up in the table. This is BAD.

Hmm, function Base.getproperty(idealgens::IdealGens, name::Symbol) also seems a bit unholy to me.

Then this code from standard_basis performs two or even three lookups in I.gb in case of a match.

  if haskey(I.gb, ordering) && (complete_reduction == false || I.gb[ordering].isReduced == true)
    return I.gb[ordering]
  end

Since we have complete_reduction == false in the membership testing case, we "only" have two lookups, but those account for most of the total runtime (here ~51.042 out of 53.792 µs):

julia> @btime if haskey(I.gb, ord)
           return I.gb[ord]
         end;
  51.042 μs (896 allocations: 27.22 KiB)

Of course we can trivially optimize this to one lookup (following code would have to be adjusted):

julia> @btime get(I.gb, ord, nothing);
  25.583 μs (448 allocations: 13.61 KiB)

But that's still bad. A third of it is for hashing (which we could avoid if we stored the default ordering somewhere, cached its hash code, and while at it also the is_global information):

julia> @btime hash(ord);
  8.514 μs (150 allocations: 4.55 KiB)

But of course even better is to not use the default ordering as a lookup key at all, as I did with my hacked together code...

fingolfin commented 1 year ago

so I had this:

julia> @btime my_ideal_membership(a, I);
  2.157 μs (30 allocations: 688 bytes)

Looking closer, most of this comes from this:

julia> @btime tmpSingular.reduce(Sx(a), GI.S);
  2.190 μs (30 allocations: 688 bytes)

and a huge chunk of this is from conversion:

julia>  @btime Sx(a)
  1.600 μs (29 allocations: 656 bytes)

julia> c = Sx(a);

julia> @btime Singular.reduce(c, GI.S);
  430.276 ns (1 allocation: 32 bytes)

I have no idea if we can make the conversion faster. But even if we eliminate it, that's only another factor 4-5, as reduce takes 430ns.

Then again, maybe "just" that initial factor 25 would help you?

Oh and if MPolyQuoRingElem could store the information whether their underlying element is already simplified, then that would probably also help quite a bit.

thofma commented 1 year ago

My plan for easy improvement was to store the simplification information on the element. I thought I already did this in the past, but obviously not.

ederc commented 1 year ago

@fingolfin Concerning your question about ideal_membership: You might have a GB cached, but w.r.t. a not efficient monomial ordering. So sometimes it might be even better to first compute a new GB w.r.t. a more efficient ordering and then decide ideal membership w.r.t. the more efficient ordering.

fingolfin commented 1 year ago

@ederc OK, but that's not what the code does in practice...

ederc commented 1 year ago

@ederc OK, but that's not what the code does in practice...

Why? It checks for a GB w.r.t. an efficient ordering (either default or given by the user), computes it if necessary and then uses this GB to answer the question.

fingolfin commented 1 year ago

In practice, when used via a in I, there is no user-given ordering, just the default ordering (which is each time constructed from scratch, and validated, already causing overhead). But why should the default ordering be efficient, or more efficient at least in this particular case? I can see that a user with extra information may control for this, by specifying an ordering taylored to the case at hand, but the default doesn't do this, does it?

Of course first(I.gb) does not pick a GB for the default ordering; it's a quick & dirty proof-of-concept hack. But surely we could e.g. add an I.defaultGB field which, if bound, points to the "default GB" to be used for e.g. membership tests. Then we could access it w/o constructing an ordering object, and also without computing the hash of one of these and performing a hash table lookup

ederc commented 1 year ago

After discussion with @wdecker I propose that we go with first(I.gb) at the moment and do not implement something like I.defaultGB right now. I will try to provide a corresponding PR for this soon.

fingolfin commented 1 year ago

A proposal from discussing on #2133:

It seems that a lot of the problems could be addressed (or at least be reduced) if MonomialOrdering cached its hash value and perhaps a few other things, and if the the default ordering was actually stored in the ring, so that calling default_ordering(base_ring(I)) twice returns an identical ordering. Does that sound sensible?

I also wonder if perhaps the ideal should also have a default (?) ordering (which might be the same as that of the base ring by default?), as in: if I know this particular ideal has a structure that makes some order better than the default "degrevlex", then perhaps one should allow specifying that? (Maybe this is a bad idea.)

Also @afkafkafk13 pointed out that we should be careful to not accidentally uses non-global GBs for e.g. membership tests. So we could cache the information whether the ordering is global/local/mixed resp. even which variables are global (i.e. satisfy x > 1). For the "standard" ordering (such as lex, deglex, degrevlex) we know this for zero cost at creation time.