ziotom78 / Healpix.jl

Healpix library written in Julia
GNU General Public License v2.0
51 stars 18 forks source link

Tests fail on Windows x86_64 with Julia 1.6.3 #67

Closed ziotom78 closed 2 years ago

ziotom78 commented 2 years ago

There is a repeatable failure in our tests happening with Julia 1.6.3 on Windows:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x23305066 -- mypow at C:\Users\runneradmin\.julia\artifacts\77e2bedf7d1a0f2dbfc400063c490af5406e482c\bin\libsharp2-0.dll (unknown line)
in expression starting at D:\a\Healpix.jl\Healpix.jl\test\test_sphtfunc.jl:8

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x23305066 -- mypow at C:\Users\runneradmin\.julia\artifacts\77e2bedf7d1a0f2dbfc400063c490af5406e482c\bin\libsharp2-0.dll (unknown line)
mypow at C:\Users\runneradmin\.julia\artifacts\77e2bedf7d1a0f2dbfc400063c490af5406e482c\bin\libsharp2-0.dll (unknown line)
in expression starting at D:\a\Healpix.jl\Healpix.jl\test\test_sphtfunc.jl:8
iter_to_ieee at C:\Users\runneradmin\.julia\artifacts\77e2bedf7d1a0f2dbfc400063c490af5406e482c\bin\libsharp2-0.dll (unknown line)
calc_map2alm at C:\Users\runneradmin\.julia\artifacts\77e2bedf7d1a0f2dbfc400063c490af5406e482c\bin\libsharp2-0.dll (unknown line)
ERROR: Package Healpix errored during testing (exit code: 3221225477)
Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Pkg\src\Types.jl:55
 [2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing)
   @ Pkg.Operations C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Pkg\src\Operations.jl:1693
 [3] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, test_fn::Nothing, julia_args::Cmd, test_args::Cmd, kwargs::Base.Iterators.Pairs{Symbol, IOContext{Base.PipeEndpoint}, Tuple{Symbol}, NamedTuple{(:io,), Tuple{IOContext{Base.PipeEndpoint}}}})
   @ Pkg.API C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Pkg\src\API.jl:343
 [4] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{Base.PipeEndpoint}, kwargs::Base.Iterators.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:coverage,), Tuple{Bool}}})
   @ Pkg.API C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Pkg\src\API.jl:80
 [5] test(; name::Nothing, uuid::Nothing, version::Nothing, url::Nothing, rev::Nothing, path::Nothing, mode::Pkg.Types.PackageMode, subdir::Nothing, kwargs::Base.Iterators.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:coverage,), Tuple{Bool}}})
   @ Pkg.API C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Pkg\src\API.jl:96
 [6] top-level scope
   @ none:1
Error: Process completed with exit code 1.

As far as I know, this error did not happen with Julia 1.6.2.

It seems that the place where this is happening is in mypow, a C++ function in libsharp2. I will investigate this once I have access to a Windows machine.

mreineck commented 2 years ago

One possibility is that Julia screws up the alignment of the stack before calling into the C library; that could explain the symptoms. If that is the case, I expect that there will be some regression reports against Julia 1.6.3 fairly soon. At the moment, the release is still so new that it might not have been noticed yet.

ziotom78 commented 2 years ago

As I feared, it was longer than I hoped… Unfortunately, the patch does not work, here is the new error:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x56b55090 --  at 0x56b55090 -- OLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x56b55090 --  at 0x56b55090 -- OLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x56b55090 --  at 0x56b55090 -- OLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x56b55090 --  at 0x56b55090 -- OLATION with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x56b55090 -- mypow at ***\.julia\artifacts\331e8a6093782bd87dec34d4983f81904b11c7be\bin\libsharp2-0.dll
(unknown line) in expression starting at ***\Libsharp.jl\test\runtests.jl:40
in\libsharp2-0.dll (unknown line)
in expression starting at ***\Libsharp.jl\test\runtests.jl:40
in expression starting at ***\Libsharp.jl\test\runtests.jl:40
in\libsharp2-0.dll (unknown line)

I checked that the code is compiled using 64-bit, and sharp_architecture() returns fma.

I am thinking of testing Julia 1.6.0, 1.6.1, and 1.6.2 on Windows and then report my findings to the Julia developers, mentioning your suggestion of a misaligned stack frame.

Thanks for the help!

mreineck commented 2 years ago

Thanks for testing! BTW, how do you compile under Windows? If sharp_architecture returns "fma", that sounds as if you manage to use the "configure" machinery; in that case your compiler is most likely not MSVC (as I assumed) but the MinGW gcc or similar.

It might be interesting to try and switch on debugging information with "-g", to get (hopefully) some meaningful line numbers in the error report.

In any case, even if we don't manage to find the root of the problem, we should at least be able to work around it by not using the -DMULTIARCH flag and not using -march=native either. This should reduce vectorization to SSE2, and that will most likely work.

ziotom78 commented 2 years ago

Ah, sorry, you asked me this morning about the compiler, but I forgot to answer you. I am using BinaryBuilder.jl (https://github.com/JuliaPackaging/BinaryBuilder.jl), which is a Julia package that provides cross-compilers based on GCC and Clang. This is the script I am currently using to build the binaries for Linux, Mac OS X, and Windows:

https://github.com/JuliaPackaging/Yggdrasil/blob/dc6899748915b1cb39317356dcbc4e198c427d7a/L/libsharp2/build_tarballs.jl#L16-L30

Thanks for the hint about -g and the other flags, I'll test this immediately!

ziotom78 commented 2 years ago

Solved by #68.