Closed jjgarzella closed 3 days ago
This issue is caused by fmpz and Integer hashing being different. This has performance reasons...
And that's also why fmpz isn't subtyping Integer: to do that, hashing must match.
That said, I an unhappy about this as well.
On Wed, Jul 28, 2021 at 09:03:09AM -0700, Max Horn wrote:
This issue is caused by fmpz and Integer hashing being different. This has performance reasons...
And that's also why fmpz isn't subtyping Integer: to do that, hashing must match.
That said, I an unhappy about this as well.
see also https://github.com/Nemocas/Nemo.jl/pull/700
The story is more complicated - and will pop up at least once every year. Numbers, in julia is (to me) kind of an implementation of the complex numbers (or maybe only the reals), objects used heavily in computational maths, numerics. So objects, ie. 1 that can be represented as Int(1), UInt(1), Float64(1), BigInt(1), Rational{Int}(1), ... all behave "exactly" the same way, in particular, they hash identical. (Limits of this you can see in Int(2)^100 vs Float64(2)^100)
fmpz is not part of the "real" world, it is part of the symbolic world. Here, mostly, different "rings" can be linked by injective maps (even implicit ones), but since they are not identical, there is no need to have them hash the same.
For me, fmpz = ZZ injects into fmpq = QQ which injects into all number fields, into polynomial rings, .... While quadratic fields (QQ[sqrt 2], QQ[i]) can be used as complex numbers, more complicated fields do not have any canonical real version, hene there is no way to extend julias ideas here.
As types can only be in one place in the type graph, fmpz is in the symbolic, exact, branch, while Int is on the numerical side...
If anything, due to Int(2)^100 == 0, I'd argue more in favour of removing Int from Numbers as Int is more an implementation of ZZ/2^64ZZ... However, choices were made for valid pragmatic reasons
The problem with "isequal" is distinct, this could be considered an error. Different types should never be "isequal" (in our world). Maybe we need a fallback isequal(::Number, ::RingElem) = false or similar.
The hashing speed is also a consequece of this: Floats are always normalised: the mantissa does not have leading/ trailing zeros, they get absorbed into the exponent. Thus essentially to hash a BigInt a copy is produced where the trailing zeros are removed. This is now thought of as a mantissa and the hash of the mantissa is computed. The copy and normalisation kill the speed as this involves memory management. This hashing should be a constant time operation, but julia's BigInt hashing is not.
The Dict problem actually has an easier, more julia solution
julia> a = Dict{Int, Int}() julia> a[1] = 2 2
julia> a[fmpz(1)] = 3 3
julia> a Dict{Int64, Int64} with 1 entry: 1 => 3
declaring as Dict{fmpz, Int}() would also be consistend.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/oscar-system/Oscar.jl/issues/591#issuecomment-888430604
Thus essentially to hash a BigInt a copy is produced
This is supposed to be fixed, I think BigInt
hashing doesn't allocate anymore (though it's still slower than for fmpz
, as mentioned at https://github.com/Nemocas/Nemo.jl/pull/700#issue-339031489). But the perf got broken again by accident, on Julia 1.6 I think, and I don't remember if the fix was backported on the latest 1.6.x.
For the Dict
thing, there is still unfortunate behavior with concrete keytype
, e.g. :
julia> x = big(2)^100
1267650600228229401496703205376
julia> y = fmpz(2)^100
1267650600228229401496703205376
julia> d = Dict(x => x)
Dict{BigInt, BigInt} with 1 entry:
1267650600228229401496703205376 => 1267650600228229401496703205376
julia> d[y]
ERROR: KeyError: key 1267650600228229401496703205376 not found
Stacktrace:
[...]
julia> d[y] = 9
9
julia> d
Dict{BigInt, BigInt} with 1 entry:
1267650600228229401496703205376 => 9
I agree with all points @fieker brings up.
I am still unhappy about all this, because it is a source of confusion and it comes up again and again for understandable reasons. That said, I see no good solution. In the end, embedding in a "general purpose language" like Julia always has these kinds of drawbacks for a computer algebra system. The original sin here is that 2 is an Int64 and so 2^100 = 0; I understand why, but this is not something you'd ever see in a traditional CAS. For better and for worse.
What I still don't fully appreciate is how making fmpz <: Signed
has more practical drawbacks than the status-quo. (Besides hashing speed, which can have workarounds as highlighted in the PR mentioned above).
all the existing code accepting ring elements have to be changed to allow the correct container to also allow the class containing signed and then fixed to not allow Int e.g. as it will overflow. Julia "Int" are considered real numbers, thus have characteristic 0. Clearly they are not. So the code needs to check this. In use for indices/ sets/ ... this is harmless. In code that computes, not. Hypothetical function: pow(a::RingElem, d::Int) = a^d will produce wrong results on all but one type of Signed: it is correct for BigInt, would be fine for fmpz, but is wrong for Int. This needs to be checked everywhere: if input RingElem is <: Int convert to fmpz first
Our fault: isequal needs to be changed to false Julia: a rather strong assumption that different types should produce identical hashes.
Allocation: in current 1.6.0:
julia> a = big(2)^1000;
julia> @time hash(a, UInt(0))
0.000018 seconds (4 allocations: 64 bytes)
0xc6037f799d90b0de
julia> a = a^2;
julia> @time hash(a, UInt(0))
0.000028 seconds (6 allocations: 104 bytes)
0x5ddec8a7d68b1d14
is allocating.
Is there some clever programming that can be done to still hash mpz's the way julia wants without allocating?
On Fri, Jul 30, 2021 at 04:34:15AM -0700, dan wrote:
Is there some clever programming that can be done to still hash mpz's the way julia wants without allocating? Of course, but will it be maintainable? At the end, it is about an iteration over the limbs of the mpz with the trailing (binary) zeros removed. I can fake that using bit-twiddling. Do I want to? For what gain?
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/oscar-system/Oscar.jl/issues/591#issuecomment-889832495
Yes hashing of BigInt
allocates on 1.6.0, but I don't know if the fix was backported to a later version 1.6.x.
Is there some clever programming that can be done to still hash mpz's the way julia wants without allocating?
The same non-allocating algo as the one used for BigInt
should easily be translatable for fmpz
. I did the PR against Julia for non-allocating hashing specifically as a POC to show that hashing of BigInt
and fmpz
could be made to agree without the allocation overhead that was occuring before.
Regarding allocations: I am a bit confused; I am seeing the same number of allocations when hashing fmpz
and BigInt
, with Nemo v0.26.1 -- and in both Julia 1.3.1 and 1.6.2. Here are my numbers (on a macBook); all tests were run several times to get rid of compilation overhead. I tested using both @time
and @btime
from BenchmarkTools
(they show different allocation stats, though... huh?). And I used this highly sophisticated test function:
function testit(x) h= UInt(0) ; for i = 1:1000 h = hash(x, h) end ; return h end
With Julia 1.3.1:
julia> x=big(5)^10000; @time testit(x)
0.002368 seconds (5 allocations: 176 bytes)
0xb0abfd32eafe04e8
julia> x=big(5)^10000; @btime testit(x)
1.719 ms (1 allocation: 16 bytes)
0xb0abfd32eafe04e8
julia> x=fmpz(5)^10000; @time testit(x)
0.002268 seconds (5 allocations: 176 bytes)
0x86bdbb29a1df503d
julia> x=fmpz(5)^10000; @btime testit(x)
1.658 ms (1 allocation: 16 bytes)
0x86bdbb29a1df503d
In Julia 1.6.2:
julia> x=big(5)^10000; @time testit(x)
0.002169 seconds (1 allocation: 16 bytes)
0xb0abfd32eafe04e8
julia> x=big(5)^10000; @btime testit(x)
1.717 ms (1 allocation: 16 bytes)
0xb0abfd32eafe04e8
julia> x=fmpz(5)^10000; @time testit(x)
0.002289 seconds (1 allocation: 16 bytes)
0x86bdbb29a1df503d
julia> x=fmpz(5)^10000; @btime testit(x)
1.659 ms (1 allocation: 16 bytes)
0x86bdbb29a1df503d
I can also confirm that this was broken in Julia 1.6.0 and in Julia 1.6.1, the numbers below are for 1.6.1 (in 1.6.0 I get similar numbers); but clearly the fix was backported to 1.6.2:
julia> x=big(5)^10000; @time testit(x)
0.249412 seconds (1.09 M allocations: 515.121 MiB, 9.20% gc time)
0xb0abfd32eafe04e8
julia> x=big(5)^10000; @btime testit(x)
218.912 ms (1088001 allocations: 515.12 MiB)
0xb0abfd32eafe04e8
julia> x=fmpz(5)^10000; @time testit(x)
0.002213 seconds (1 allocation: 16 bytes)
0x86bdbb29a1df503d
julia> x=fmpz(5)^10000; @btime testit(x)
1.659 ms (1 allocation: 16 bytes)
0x86bdbb29a1df503d
On Wed, Aug 18, 2021 at 01:55:45AM -0700, Max Horn wrote:
Using @time rather then @btime I see the allocations.
julia> x=big(5)^10000; @time testit(x) 0.455273 seconds (1.21 M allocations: 521.903 MiB, 39.84% compilation time) 0xb0abfd32eafe04e8
julia> x=big(5)^10000; @time testit(x) 0.655503 seconds (1.10 M allocations: 515.121 MiB, 32.17% gc time) 0xb0abfd32eafe04e8
julia> x=fmpz(5)^10000; @time testit(x) 0.019273 seconds (21.01 k allocations: 1.427 MiB, 76.97% compilation time) 0x86bdbb29a1df503d
julia> x=fmpz(5)^10000; @time testit(x) 0.006985 seconds (1 allocation: 16 bytes) 0x86bdbb29a1df503d
Looking at the julia code, he allocation is clearly there, the BigInt is shifted to be odd. That requires the alloc.
My guess is that @btime is stupid?
Regarding allocations: I am a bit confused; I am seeing the same number of allocations when hashing
fmpz
andBigInt
, with Nemo v0.26.1 -- and in both Julia 1.3.1 and 1.6.2. Here are my numbers (on a macBook); all tests were run several times to get rid of compilation overhead. I tested using both @.` and @.from
BenchmarkTools` (they show different allocation stats, though... huh?). And I used this highly sophisticated test function:function testit(x) h= UInt(0) ; for i = 1:1000 h = hash(x, h) end ; return h end
With Julia 1.3.1:
julia> x=big(5)^10000; @time testit(x) 0.002368 seconds (5 allocations: 176 bytes) 0xb0abfd32eafe04e8 julia> x=big(5)^10000; @btime testit(x) 1.719 ms (1 allocation: 16 bytes) 0xb0abfd32eafe04e8 julia> x=fmpz(5)^10000; @time testit(x) 0.002268 seconds (5 allocations: 176 bytes) 0x86bdbb29a1df503d julia> x=fmpz(5)^10000; @btime testit(x) 1.658 ms (1 allocation: 16 bytes) 0x86bdbb29a1df503d
In Julia 1.6.2:
julia> x=big(5)^10000; @time testit(x) 0.002169 seconds (1 allocation: 16 bytes) 0xb0abfd32eafe04e8 julia> x=big(5)^10000; @btime testit(x) 1.717 ms (1 allocation: 16 bytes) 0xb0abfd32eafe04e8 julia> x=fmpz(5)^10000; @time testit(x) 0.002289 seconds (1 allocation: 16 bytes) 0x86bdbb29a1df503d julia> x=fmpz(5)^10000; @btime testit(x) 1.659 ms (1 allocation: 16 bytes) 0x86bdbb29a1df503d
I can also confirm that this was broken in Julia 1.6.0 and in Julia 1.6.1, the numbers below are for 1.6.1 (in 1.6.0 I get similar numbers); but clearly the fix was backported to 1.6.2:
julia> x=big(5)^10000; @time testit(x) 0.249412 seconds (1.09 M allocations: 515.121 MiB, 9.20% gc time) 0xb0abfd32eafe04e8 julia> x=big(5)^10000; @btime testit(x) 218.912 ms (1088001 allocations: 515.12 MiB) 0xb0abfd32eafe04e8 julia> x=fmpz(5)^10000; @time testit(x) 0.002213 seconds (1 allocation: 16 bytes) 0x86bdbb29a1df503d julia> x=fmpz(5)^10000; @btime testit(x) 1.659 ms (1 allocation: 16 bytes) 0x86bdbb29a1df503d
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/oscar-system/Oscar.jl/issues/591#issuecomment-900941809
On Wed, Aug 18, 2021 at 01:55:45AM -0700, Max Horn wrote: Could this be a mac feature?
Regarding allocations: I am a bit confused; I am seeing the same number of allocations when hashing
fmpz
andBigInt
, with Nemo v0.26.1 -- and in both Julia 1.3.1 and 1.6.2. Here are my numbers (on a macBook); all tests were run several times to get rid of compilation overhead. I tested using both @.` and @.from
BenchmarkTools` (they show different allocation stats, though... huh?). And I used this highly sophisticated test function:function testit(x) h= UInt(0) ; for i = 1:1000 h = hash(x, h) end ; return h end
With Julia 1.3.1:
julia> x=big(5)^10000; @time testit(x) 0.002368 seconds (5 allocations: 176 bytes) 0xb0abfd32eafe04e8 julia> x=big(5)^10000; @btime testit(x) 1.719 ms (1 allocation: 16 bytes) 0xb0abfd32eafe04e8 julia> x=fmpz(5)^10000; @time testit(x) 0.002268 seconds (5 allocations: 176 bytes) 0x86bdbb29a1df503d julia> x=fmpz(5)^10000; @btime testit(x) 1.658 ms (1 allocation: 16 bytes) 0x86bdbb29a1df503d
In Julia 1.6.2:
julia> x=big(5)^10000; @time testit(x) 0.002169 seconds (1 allocation: 16 bytes) 0xb0abfd32eafe04e8 julia> x=big(5)^10000; @btime testit(x) 1.717 ms (1 allocation: 16 bytes) 0xb0abfd32eafe04e8 julia> x=fmpz(5)^10000; @time testit(x) 0.002289 seconds (1 allocation: 16 bytes) 0x86bdbb29a1df503d julia> x=fmpz(5)^10000; @btime testit(x) 1.659 ms (1 allocation: 16 bytes) 0x86bdbb29a1df503d
I can also confirm that this was broken in Julia 1.6.0 and in Julia 1.6.1, the numbers below are for 1.6.1 (in 1.6.0 I get similar numbers); but clearly the fix was backported to 1.6.2:
julia> x=big(5)^10000; @time testit(x) 0.249412 seconds (1.09 M allocations: 515.121 MiB, 9.20% gc time) 0xb0abfd32eafe04e8 julia> x=big(5)^10000; @btime testit(x) 218.912 ms (1088001 allocations: 515.12 MiB) 0xb0abfd32eafe04e8 julia> x=fmpz(5)^10000; @time testit(x) 0.002213 seconds (1 allocation: 16 bytes) 0x86bdbb29a1df503d julia> x=fmpz(5)^10000; @btime testit(x) 1.659 ms (1 allocation: 16 bytes) 0x86bdbb29a1df503d
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/oscar-system/Oscar.jl/issues/591#issuecomment-900941809
I am seeing the same number of allocations when hashing fmpz and BigInt
On 1.6.2+, this is expected given the optimization made on BigInt
hashing. On 1.3, I think there is no allocations (with @btime testif($x)
) because x
is odd, so no shifting is necessary: with $(x-1)
it allocates.
I guess that on 1.6.0, the result is catastrophic because the introduced bug made hash(::BigInt)
use the most generic implementation of hash
, disgarding a partial optimization which was available in 1.3.
he allocation is clearly there, the BigInt is shifted to be odd. That requires the alloc.
This was fixed, then the fix was broken (on 1.6.0 IIRC), then refixed (1.7), and backported (1.6.2, according to Max's post above).
@fingolfin Also, the hashing performance depends on the input, cf. https://github.com/Nemocas/Nemo.jl/pull/700#issue-339031489 for a couple of examples.
This hashing is probably a moot point, because fmpz still lives in the mathematical world and Base.Integer lives in the numerical world. The former behaviour here would be catastrophic to a computer algebra system.
julia> typeof(exp(BigInt(0)))
BigFloat
julia> typeof(exp(fmpz(0)))
fmpz
Really? What's catastrophic about that? The latter seems useless?
First, exact input yielding inexact output of some unspecified precision is already difficult to work with. Second, it sets a dangerous precedent. The exponential function is a perfectly meaningful thing to calculate is some rings (i.e. R,x=PowerSeriesRing(QQ, 9, "x")
as long as you can calculate the exponential of the constant term), and we surely wouldn't want approximant answers for exp(x)
here.
Currently in Oscar, if one uses an Oscar integer as a dictionary key, it registers as a separate entry from a standard integer:
This behavior has caused subtle bugs for me, and I think would be pretty confusing for beginners.
It seems like the crux of this issue comes down to either a) changing the indexing behaviorfor ring elements or b) the fact that
fmpz
is not a Julia Integer type.After scrolling through the docs, they seem to say that
fmpz
isn't a subtype ofBase.Integer
because it's behavior is different thanBase.Int
andBase.BigInt
for performance reasons. I don't quite understand why this precludesfmpz
from being aBase.Integer
, at least in the context of Julia's type system. Perhaps there is a ring-functionality-related reason thatfmpz
can't be a Julia integer?