Closed HechtiDerLachs closed 1 year ago
Can you quantify what "much time" means for the lookup? 99 percent? 50 percent? 10 percent?
Often it looks close to 95% of the time consumed for the is_zero
test.
Edit: On my machine, this can even be observed for the above example when using @profview
.
I see various (easy) ways to improve it. I will open a PR soonish.
Do you have a non-trivial example (but which does not run for hours)?
A side remark for @HechtiDerLachs: I recommend using the BenchmarkTools
package for doing such timings. So instead of applying @time
to a loop, you can do this:
julia> using BenchmarkTools
julia> @btime iszero(a);
41.961 ns (0 allocations: 0 bytes)
julia> @btime iszero(b);
40.708 μs (675 allocations: 20.06 KiB)
So that shows a factor 1000 in performance, and a lot of extra allocations.
BTW you can use the @benchmark
macro to get a super nifty visualization of the distribution of the timings for several runs.
Now, I thought that surely a in I
(surely equivalent to is_zero(b)
) might be faster, but to my surprise it is even worse:
julia> @btime a in I
53.792 μs (929 allocations: 28.02 KiB)
false
So what does is_zero(b)
do? This:
iszero(x) = x == zero(x) # fallback method
Tracing down, a significant part of the time is spent in simplify
:
julia> zz = zero(b)
0
julia> b == zz
false
julia> @btime simplify(b)
22.667 μs (366 allocations: 10.55 KiB)
x
julia> @btime simplify(zz)
17.667 μs (307 allocations: 9.41 KiB)
0
So if those elements would simply store the information that they are already simplified (or heck, if we just added a check that says "if the underlying element is zero, then it definitely is simplified"), this would probably become a lot faster. It doesn't help that the simplify
method in this case is not type stable.
So back to the a in I
test, which really invokes this function:
function ideal_membership(f::T, I::MPolyIdeal{T}; ordering::MonomialOrdering = default_ordering(base_ring(I))) where T
GI = standard_basis(I, ordering=ordering, complete_reduction=false)
singular_assure(GI)
Sx = base_ring(GI.S)
return Singular.iszero(Singular.reduce(Sx(f), GI.S))
end
Base.:in(f::MPolyRingElem, I::MPolyIdeal) = ideal_membership(f,I)
Why do we specify a term ordering for ideal_membership
??? Let's just grab the first possible GB instead:
function my_ideal_membership(f::T, I::MPolyIdeal{T}) where T
if isempty(I.gb)
GI = standard_basis(I, ordering=default_ordering(base_ring(I)), complete_reduction=false)
else
GI = first(I.gb)[2]
end
singular_assure(GI)
Sx = base_ring(GI.S)
return Singular.iszero(Singular.reduce(Sx(f), GI.S))
end
then I get these numbers:
julia> @btime ideal_membership(a, I);
54.000 μs (929 allocations: 28.02 KiB)
julia> @btime my_ideal_membership(a, I);
2.157 μs (30 allocations: 688 bytes)
So that's a factor 25 faster, MUCH better. But we are still slower by a factor 40. I am sure it could be brought down more.
But I also worry that there is a lot more code with problems of this kind in there... Hrm
Do you have a non-trivial example (but which does not run for hours)?
I was trying to produce one. But I failed. I will let you know if this changes. Until then, I guess the best I could do is to use the above example with more iterations.
So that's a factor 25 faster, MUCH better.
That is more or less what I was expecting. A lot of time seems to have been spent on the lookup of the Groebner basis in the dictionary, usually.
Yeah, we spend a lot of effort to build a fresh ordering. Then @assert is_global(ordering)
actually is slow. And then we need to hash that ordering and look it up in the table. This is BAD.
Hmm, function Base.getproperty(idealgens::IdealGens, name::Symbol)
also seems a bit unholy to me.
Then this code from standard_basis
performs two or even three lookups in I.gb
in case of a match.
if haskey(I.gb, ordering) && (complete_reduction == false || I.gb[ordering].isReduced == true)
return I.gb[ordering]
end
Since we have complete_reduction == false
in the membership testing case, we "only" have two lookups, but those account for most of the total runtime (here ~51.042 out of 53.792 µs):
julia> @btime if haskey(I.gb, ord)
return I.gb[ord]
end;
51.042 μs (896 allocations: 27.22 KiB)
Of course we can trivially optimize this to one lookup (following code would have to be adjusted):
julia> @btime get(I.gb, ord, nothing);
25.583 μs (448 allocations: 13.61 KiB)
But that's still bad. A third of it is for hashing (which we could avoid if we stored the default ordering somewhere, cached its hash code, and while at it also the is_global
information):
julia> @btime hash(ord);
8.514 μs (150 allocations: 4.55 KiB)
But of course even better is to not use the default ordering as a lookup key at all, as I did with my hacked together code...
so I had this:
julia> @btime my_ideal_membership(a, I);
2.157 μs (30 allocations: 688 bytes)
Looking closer, most of this comes from this:
julia> @btime tmpSingular.reduce(Sx(a), GI.S);
2.190 μs (30 allocations: 688 bytes)
and a huge chunk of this is from conversion:
julia> @btime Sx(a)
1.600 μs (29 allocations: 656 bytes)
julia> c = Sx(a);
julia> @btime Singular.reduce(c, GI.S);
430.276 ns (1 allocation: 32 bytes)
I have no idea if we can make the conversion faster. But even if we eliminate it, that's only another factor 4-5, as reduce takes 430ns.
Then again, maybe "just" that initial factor 25 would help you?
Oh and if MPolyQuoRingElem
could store the information whether their underlying element is already simplified, then that would probably also help quite a bit.
My plan for easy improvement was to store the simplification information on the element. I thought I already did this in the past, but obviously not.
@fingolfin Concerning your question about ideal_membership
: You might have a GB cached, but w.r.t. a not efficient monomial ordering. So sometimes it might be even better to first compute a new GB w.r.t. a more efficient ordering and then decide ideal membership w.r.t. the more efficient ordering.
@ederc OK, but that's not what the code does in practice...
@ederc OK, but that's not what the code does in practice...
Why? It checks for a GB w.r.t. an efficient ordering (either default or given by the user), computes it if necessary and then uses this GB to answer the question.
In practice, when used via a in I
, there is no user-given ordering, just the default ordering (which is each time constructed from scratch, and validated, already causing overhead). But why should the default ordering be efficient, or more efficient at least in this particular case? I can see that a user with extra information may control for this, by specifying an ordering taylored to the case at hand, but the default doesn't do this, does it?
Of course first(I.gb)
does not pick a GB for the default ordering; it's a quick & dirty proof-of-concept hack. But surely we could e.g. add an I.defaultGB
field which, if bound, points to the "default GB" to be used for e.g. membership tests. Then we could access it w/o constructing an ordering
object, and also without computing the hash of one of these and performing a hash table lookup
After discussion with @wdecker I propose that we go with first(I.gb)
at the moment and do not implement something like I.defaultGB
right now. I will try to provide a corresponding PR for this soon.
A proposal from discussing on #2133:
It seems that a lot of the problems could be addressed (or at least be reduced) if MonomialOrdering
cached its hash value and perhaps a few other things, and if the the default ordering was actually stored in the ring, so that calling default_ordering(base_ring(I))
twice returns an identical ordering. Does that sound sensible?
I also wonder if perhaps the ideal should also have a default (?) ordering (which might be the same as that of the base ring by default?), as in: if I know this particular ideal has a structure that makes some order better than the default "degrevlex", then perhaps one should allow specifying that? (Maybe this is a bad idea.)
Also @afkafkafk13 pointed out that we should be careful to not accidentally uses non-global GBs for e.g. membership tests. So we could cache the information whether the ordering is global/local/mixed resp. even which variables are global (i.e. satisfy x > 1
). For the "standard" ordering (such as lex, deglex, degrevlex) we know this for zero cost at creation time.
I was observing that
is_zero
is rather slow for quotient rings. For instance, try the following code:I get
0.000083 seconds
vs.0.116045 seconds
. Moreover, an analysis with@profview
indicates that for every call tois_zero
, a groebner basis forI
is looked up in a dictionary. This lookup seems to first create an integer matrix for the given ordering and then these matrices are compared. In various other instances (which I can unfortunately not reproduce here) this has lead to significant time consumption in my computations (At least it showed up to consume much time in theProfileView
analysis). My general feeling is that this comes to bear when we need to check that many polynomials are zero modulo an ideal, for instance when dealing with matrices over quotient rings or polynomial rings over quotient rings.Any ideas on improvements for this?