Incorrect computation with large integers since SymPy v2.1.0

JoshuaLampert commented 3 months ago

Describe the bug In the code block below it can analytically be shown that the results that are printed after "1st equation:" and "2nd equation:" are both 0. SymPy.jl v2.0.1 also exactly returns 0. However, with SymPy.jl v2.1.0 it returns something else (see below). Using rational = true prints the non-zero result with both versions.

To Reproduce Execute

import SymPy; sp = SymPy

function etasol_soliton(t, x, D, g)
    rho = 18/5
    c = 5/2*sqrt(g*D)
    xi = x - c*t
    theta = 1/2*sqrt(rho)*xi/D
    return 15/4*D*(2*sech(theta)^2 - 3*sech(theta)^4)
end
function vsol_soliton(t, x, D, g)
    rho = 18/5
    c = 5/2*sqrt(g*D)
    xi = x - c*t
    theta = 1/2*sqrt(rho)*xi/D
    return 15/2*sqrt(g*D)*sech(theta)^2
end

sp.@syms g
sp.@syms D

(t, x) = sp.symbols("t, x", real = true)
eta = etasol_soliton(t, x, D, g)
v = vsol_soliton(t, x, D, g)

rational = false
println("1st equation:")
sp.simplify(sp.diff(eta, t) + sp.diff(D*v, x) + sp.diff(eta*v, x) - 1/6*sp.diff(D^2*sp.diff(eta, x, t), x), rational = rational) |>
    sp.string |> println

println("2nd equation:")
sp.simplify(sp.diff(v, t) + g*sp.diff(eta, x) + sp.diff(v^2/2, x) - 1/6*sp.diff(D^2*sp.diff(v, t), x, x), rational = rational) |>
    sp.string |> println

With SymPy.jl v2.0.1 this prints:

1st equation:
0
2nd equation:
0

and with SymPy.jl v2.1.0:

1st equation:
-sqrt(D*g)*(192108367855228810*tanh((1185854122563140*t*sqrt(D*g) - 474341649025257*x)/(500000000000000*D))^2*sech((1185854122563140*t*sqrt(D*g) - 474341649025257*x)/(500000000000000*D))^2 - 25614449047363837*tanh((1185854122563140*t*sqrt(D*g) - 474341649025257*x)/(500000000000000*D))^2 + 192108367855229430*sech((1185854122563140*t*sqrt(D*g) - 474341649025257*x)/(500000000000000*D))^4 - 217722816902592807*sech((1185854122563140*t*sqrt(D*g) - 474341649025257*x)/(500000000000000*D))^2 + 25614449047363859)*tanh((1185854122563140*t*sqrt(D*g) - 474341649025257*x)/(500000000000000*D))*sech((1185854122563140*t*sqrt(D*g) - 474341649025257*x)/(500000000000000*D))^2/400000000000000
2nd equation:
g*(7684334714209160*tanh((1185854122563140*t*sqrt(D*g) - 474341649025257*x)/(500000000000000*D))^2 + 7684334714209170*sech((1185854122563140*t*sqrt(D*g) - 474341649025257*x)/(500000000000000*D))^2 - 7684334714209169)*tanh((1185854122563140*t*sqrt(D*g) - 474341649025257*x)/(500000000000000*D))*sech((1185854122563140*t*sqrt(D*g) - 474341649025257*x)/(500000000000000*D))^2/120000000000000

(Sorry for the not quite minimal MWE, but this is hard to reduce further)

Expected behavior The code should return 0 for both equations as it does with SymPy.jl v2.0.1.

Desktop (please complete the following information):

OS: Ubuntu
Version 22

Additional context I had a closer look at the non-zero functions returned from SymPy.jl v2.1.0. Let's look at the second equation because it is simpler. This has the form

$$ a_1\cdot(a_2tanh^2(a_3) + \tilde{a_2}sech^2(a_3) - \hat{a_2})tanh(a_3)\cdot sech^2(a_3) $$

for $a_1 = g/120000000000000$, $a_2 = 7684334714209160$, $\tilde{a_2} = 7684334714209170$, $\hat{a_2} = 7684334714209169$ and a_3 = (1185854122563140*t*sqrt(D*g) - 474341649025257*x)/(500000000000000*D). Using $sech^2(x) = 1 - tanh^2(x)$ this expression vanishes exactly if $a_2 = \tilde{a_2} = \hat{a_2}$, which is not exactly the case. This let's me conclude that the computation seems to be inexact for these large integer values in SymPy.jl v2.1.0. Something similar is happening for the first equation.

jverzani commented 3 months ago

I think you are bumping into the differences with a symbolic value in SymPy and the Julia value. For your last case, if you use

u(x) = sech(Sym(x))^2 - (1 - tanh(Sym(x))^2)

You won't get the round off that makes the value zero for each of the 3 values.

Similarly, if instead of passing a "rational" value to simplify you instead ensured no floating point conversions in your function, you get the 0 value you expect. In particular using these definitions:

function etasol_soliton(t, x, D, g)
    rho = Sym(18)/5
    c = (5//2)*sqrt(g*D)
    xi = x - c*t
    theta = (1//2)*sqrt(rho)*xi/D
    return (15//4)*D*(2*sech(theta)^2 - 3*sech(theta)^4)
end
function vsol_soliton(t, x, D, g)
    rho = Sym(18)/5
    c = 5//2*sqrt(g*D)
    xi = x - c*t
    theta = 1//2*sqrt(rho)*xi/D
    return 15//2*sqrt(g*D)*sech(theta)^2
end

Now, why is there a difference in the version? If I recall, I had been using a conversion to the python value which was then made symbolic and now am using sympify (https://github.com/jverzani/SymPyCore.jl/blob/main/src/SymPy/constructors_sympy.jl#L4) The old approach was causing issues with booleans. I don't really understand why that would make a difference here, but perhaps the python conversion was doing something with the larger integer values and sympify is not.

JoshuaLampert commented 3 months ago

Thanks a lot @jverzani for the detailed and very helpful answer! That makes sense. Indeed with the definitions you gave it works as I would expect.

jverzani / SymPyCore.jl

Incorrect computation with large integers since SymPy v2.1.0 #66