trixi-framework / Trixi.jl

Trixi.jl: Adaptive high-order numerical simulations of conservation laws in Julia
https://trixi-framework.github.io/Trixi.jl
MIT License
521 stars 101 forks source link

Add tests using other real types #591

Open ranocha opened 3 years ago

ranocha commented 3 years ago

We should check whether everything works using

It would also be nice to see the impact on performance.

Tasks

sloede commented 3 years ago

As far as I can tell, MultiFloats.jl does not support transcendental functions, so maybe Quadmath.jl makes more sense for a general-purpose high-precision float. OTOH, MultiFloats.jl seems to be much better supported.

Float32 would definitely be interesting as well!

stillyslalom commented 3 years ago

This will require some changes because of Julia's automatic type promotion for mixed-precision floating point arithmetic. For example, this flux calculation includes multiplication with Float64 numeric literals which will coerce any lower-precision arguments to Float64. For this to work seamlessly, I'd recommend switching floating-point literals to integer literals, Rationals, or wrapping them with a parameterized convert(T, x) whenever possible. At the very least, the returned SVector should be parameterized to match the floating-point type of the arguments u_ll, u_rr to ensure the returned type does not differ from the input type.

For example,

julia> f(x) = x * 1//2
f (generic function with 1 method)

julia> f(2.0)
1.0

julia> f(2.0f0)
1.0f0

julia> f(Float16(2.0))
Float16(1.0)

The machine code for e.g. the Float32 method is identical to what's generated for g(x) = x * 0.5f0:

julia> @code_native f(2.0f0)
        .text
; ┌ @ REPL[3]:1 within `f'
        pushq   %rbp
        movq    %rsp, %rbp
        movabsq $.rodata.cst4, %rax
; │┌ @ promotion.jl:322 within `*' @ float.jl:331
        vmulss  (%rax), %xmm0, %xmm0
; │└
        popq    %rbp
        retq
        nopw    %cs:(%rax,%rax)
; └

julia> @code_native g(2.0f0)
        .text
; ┌ @ REPL[21]:1 within `g'
        pushq   %rbp
        movq    %rsp, %rbp
        movabsq $.rodata.cst4, %rax
; │┌ @ float.jl:331 within `*'
        vmulss  (%rax), %xmm0, %xmm0
; │└
        popq    %rbp
        retq
        nopw    %cs:(%rax,%rax)
; └
ranocha commented 3 years ago

Thanks for your input, @stillyslalom! Sure, that's part of the process. The first step will be to fix everything such that we can run simulations using plain Float32 everywhere without internal promotion to Float64. Do you happen to know whether there is some tool to check whether Float64s appear in intermediate calculations which should mostly use Float32s (without checking the source or assembly code, but automatic like @ensure_float32_math_only)?

ranocha commented 3 years ago

Of course, it would be a great PR to

ranocha commented 3 years ago

Right now, I hesitate a bit to use rational constants. Unless (something like) https://github.com/JuliaLang/julia/pull/41258 is merged, there can and will be surprising problems. For example, the case of one half reported by @stillyslalom above is fine

julia> f(x) = x * 1//2
f (generic function with 1 method)

julia> @code_native debuginfo=:none syntax=:intel f(1.5)
        .text
        movabs  rax, offset .rodata.cst8
        vmulsd  xmm0, xmm0, qword ptr [rax]
        ret
        nop

but other rational constant will be problematic, e.g.

julia> f(x) = x * 1//3
f (generic function with 1 method)

julia> @code_native debuginfo=:none syntax=:intel f(1.5)
        .text
        sub     rsp, 40
        vmovsd  qword ptr [rsp + 8], xmm0
        movabs  rax, offset divgcd
        lea     rdi, [rsp + 16]
        mov     esi, 1
        mov     edx, 3
        call    rax
        mov     rax, qword ptr [rsp + 24]
        test    rax, rax
        js      L54
        mov     rcx, qword ptr [rsp + 16]
        jmp     L66
L54:
        neg     rax
        js      L91
        xor     ecx, ecx
        sub     rcx, qword ptr [rsp + 16]
L66:
        vcvtsi2sd       xmm0, xmm1, rcx
        vcvtsi2sd       xmm1, xmm1, rax
        vdivsd  xmm0, xmm0, xmm1
        vmulsd  xmm0, xmm0, qword ptr [rsp + 8]
        add     rsp, 40
        ret
L91:
        movabs  rax, offset jl_system_image_data
        mov     qword ptr [rsp + 32], rax
        movabs  rax, offset jl_invoke
        movabs  rdi, offset jl_system_image_data
        lea     rsi, [rsp + 32]
        movabs  rcx, 140365791294448
        mov     edx, 1
        call    rax
        ud2
        nop     word ptr cs:[rax + rax]

I don't really like the (only?) ways to get good code using something like

julia> f(x::T) where {T} = x * (T(1) / T(3))
f (generic function with 1 method)

julia> @code_native debuginfo=:none syntax=:intel f(1.5)
        .text
        movabs  rax, offset .rodata.cst8
        vmulsd  xmm0, xmm0, qword ptr [rax]
        ret
        nop

julia> f(x) = x * Base.unsafe_rational(Int, 1, 3)
f (generic function with 1 method)

julia> @code_native debuginfo=:none syntax=:intel f(1.5)
        .text
        movabs  rax, offset .rodata.cst8
        vmulsd  xmm0, xmm0, qword ptr [rax]
        ret
        nop

since that's considerably less readable, I think.

sloede commented 3 years ago

I agree. I could live with writing fractions using double // syntax, but if it goes beyond that in terms of complexity, we'd have a very convincing use case first. In either case, we'd have to have checks on place that ensure that the other real does not get clobbered accidentally by Float64 by a new function implemented by an unsuspecting user.

ranocha commented 3 years ago

Now that https://github.com/JuliaLang/julia/pull/41258 is merged, we might be able to use this form of providing constants in Julia v1.8 (or whenever the change is included in a new release).

Edit: I guess it's v1.8 - it's already available in the nightly builds

julia> f(x) = x * 1//2
f (generic function with 1 method)

julia> @code_native debuginfo=:none f(1.5)
        .text
        .file   "f"
        .section        .rodata.cst8,"aM",@progbits,8
        .p2align        3                               # -- Begin function julia_f_54
.LCPI0_0:
        .quad   0x3fe0000000000000              # double 0.5
        .text
        .globl  julia_f_54
        .p2align        4, 0x90
        .type   julia_f_54,@function
julia_f_54:                             # @julia_f_54
        .cfi_startproc
# %bb.0:                                # %top
        movabsq $.LCPI0_0, %rax
        vmulsd  (%rax), %xmm0, %xmm0
        retq
.Lfunc_end0:
        .size   julia_f_54, .Lfunc_end0-julia_f_54
        .cfi_endproc
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

julia> f(x) = x * 2//3
f (generic function with 1 method)

julia> @code_native debuginfo=:none f(1.5)
        .text
        .file   "f"
        .section        .rodata.cst8,"aM",@progbits,8
        .p2align        3                               # -- Begin function julia_f_109
.LCPI0_0:
        .quad   0x3fe5555555555555              # double 0.66666666666666663
        .text
        .globl  julia_f_109
        .p2align        4, 0x90
        .type   julia_f_109,@function
julia_f_109:                            # @julia_f_109
        .cfi_startproc
# %bb.0:                                # %top
        movabsq $.LCPI0_0, %rax
        vmulsd  (%rax), %xmm0, %xmm0
        retq
.Lfunc_end0:
        .size   julia_f_109, .Lfunc_end0-julia_f_109
        .cfi_endproc
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

julia> versioninfo()
Julia Version 1.8.0-DEV.1125
Commit fa8be6fda5 (2021-12-12 19:26 UTC)
ranocha commented 1 year ago

Our current code doesn't work well with Float32 on GPUs, see https://github.com/huiyuxie/trixi_cuda/issues/3. I tried using rational constants such as (1//2) instead but got problems such as

julia> surface_flux_kernel1 = @cuda launch = false surface_flux_kernel1!(surface_flux_arr, interfaces_u, orientations, equations, surface_flux)
ERROR: InvalidIRError: compiling MethodInstance for surface_flux_kernel1!(::CuDeviceArray{Float32, 5, 1}, ::CuDeviceArray{Float32, 5, 1}, ::CuDeviceVector{Int32, 1}, ::CompressibleEulerEquations3D{Float32}, ::FluxLaxFriedrichs{typeof(max_abs_speed_naive)}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to __throw_rational_argerror_typemin(T) @ Base rational.jl:20)
Stacktrace:
 [1] checked_den
   @ ./rational.jl:24
 [2] Rational
   @ ./rational.jl:36
 [3] Rational
   @ ./rational.jl:39
 [4] //
   @ ./rational.jl:62
 [5] flux
   @ /tmp/trixi_cuda/trixi/src/equations/compressible_euler_3d.jl:379
 [6] flux_central
   @ /tmp/trixi_cuda/trixi/src/equations/numerical_fluxes.jl:20
...

Thus, it seems to be better to use 0.5f0 instead of 0.5 or (1//2). However, it makes the code also a bit more ugly to read. If we really want to use GPUs and/or nice Float32 operations on CPUs, we have to approach this problem. The only options I see are

  1. Write 0.5f0 etc.
  2. Let a macro do the job and wrap files in @muladd @trixi_special_macro begin ... end blocks - or just @trixi_special_macro begin ... end adding @muladd automatically?

Any thoughts, @trixi-framework/developers?

ranocha commented 1 year ago

Result from a local discussion: Maybe a macro? And maybe with the option to be disabled via Preferences.jl?

ranocha commented 1 year ago

Xref https://github.com/omlins/ParallelStencil.jl/blob/8530bfd12d99c12da27ffee58d95307a78f59348/src/ParallelKernel/parallel.jl#L250-L291 and https://github.com/JuliaMath/ChangePrecision.jl

ranocha commented 1 year ago

After thinking more about it, I tend to prefer an explicit solution where we write 0.5f0 instead of 0.5 etc.

ranocha commented 4 months ago

We have discussed this again and agreed to proceed as follows:

https://github.com/trixi-framework/Trixi.jl/blob/b1a84a63298966b67491dacac78e785e0dbfb36c/src/solvers/dgsem_tree/indicators.jl#L116-L118

JoshuaLampert commented 2 months ago

I just came across DispatchDoctor.jl, which provides tools to guarantee type stability without any overhead. Is it worth thinking about using this package?

ranocha commented 2 months ago

Could be worth looking at. I'm just a bit concerned about "should" in

Code which is type stable should safely compile away the check

from the README - and the warning that it will increase precompile time quite a bit

JoshuaLampert commented 2 months ago

There is some discussion on the "should" part in this discourse thread.

ranocha commented 2 months ago

I would see it mostly useful for testing/CI. But that would require wrapping basically everything in @stable and enabling it only in our tests. I haven't followed the discussion completely - is it possible to use @stable like we use @muladd to wrap complete files? It looks like it may be, but there were some caveats.

Another aspect to consider is that we will still need tests to ensure that we return the correct type, e.g., for Float64 inputs (by converting 0.5 to 0.5f0 etc.). Thus, I am not sure how much we gain since we have to call things anyway explicitly in tests. And we have checks for allocations that would warn us about type instabilities, too.

I don't want to be pessimistic - just discuss and list pro/contra arguments. It's definitely an interesting package and has been on my TODO list of things to look into more deeply.

huiyuxie commented 2 weeks ago

Please check the task box once you see something is already finished :) @ranocha

Also, please change to improve math.jl (not just ln_mean) for Float32 as basically all the functions from that file should be implemented again for Float32. Thanks!