mmp / pbrt-v4

Source code to pbrt, the ray tracer described in the forthcoming 4th edition of the "Physically Based Rendering: From Theory to Implementation" book.
https://pbrt.org
Apache License 2.0
2.9k stars 461 forks source link

Tests failing on Apple Silicon (Ventura OS) #375

Closed winterismute closed 1 year ago

winterismute commented 1 year ago

I was trying pbrt on macos on Apple Silicon, the project builds ok with eg.

mkdir build
cd build
cmake ..
make -j8

However, if I run pbrt_test I see:

[==========] Running 444 tests from 94 test cases.
[----------] Global test environment set-up.
[...]
[ RUN      ] AnalyticTestScenes/RenderTest.RadianceMatches/54
[ tid 145013888 @     0.000s film.cpp:500 ] FATAL Check failed: !L.HasNaNs()
Rendering failed. Debug with --debugstart 1,0.27526295,0,0,0.11870797,0,0,0.071955875,0,0,0.7138521,0,0,0.3922317,0,0,0.6114458,0,0,0.39523995,0,0,0.3837549,0,0,0.10117897,0,0,0.8851951,0"

zsh: abort      ./build/pbrt_test

I see you have an ARM osx machine on the gihub CI, but maybe you only check compilation there? My clang --version:

Apple clang version 14.0.3 (clang-1403.0.22.14.1)

I could replicate the error on another M2 machine (this is an M1 one).

It does unfortunately seem to be an AS-related issue: if I first do /usr/bin/arch -arch x86_64 /bin/zsh in the terminal, triggering Rosetta, and repeat all of the above, I don't see the failure.

Any chance you might look at this problem?

pbrt4bounty commented 1 year ago

I compiled without problems on a Mac mini M2, arm64 with clang 14.0.3 under Ventura 13 but did not run any tests because the binaries work fine here. I can try later, to confirm

winterismute commented 1 year ago

Actually, it was rather easy to trigger the same assert while rendering, for example rendering a scene such as villa-lights-on with the MLT integrator, even at 1 spp should trigger it. I can post full repro instructions if you don't hit it.

mmp commented 1 year ago

Unfortunately I can't reproduce it here on an M1 mac running Monterey. Could you do a debug build of pbrt and then post the output when it crashes (for both cases.) Hopefully an earlier assertion will hit, which would make it easier to track down the problem.

winterismute commented 1 year ago

@mmp could you share the clang version you use and for which everything works on Apple Silicon?

mmp commented 1 year ago
% g++ -v   
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 13.0.0 (clang-1300.0.29.3)
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
winterismute commented 1 year ago

Thank you. So, I think the code might be affected by this clang change: https://releases.llvm.org/14.0.0/tools/clang/docs/ReleaseNotes.html#floating-point-support-in-clang From 14.0.0, ffp-contract is set to on, while before it was effectively off. This change affects ARM platforms more than x86 because, I think, x86 produces the fused ops only when some of the architectural-related optimizations are on - which I am not sure if you turn on by default either. The situation is complicated even more by the fact that GCC still, I believe, sets this to off by default for standard C++ compilation. In any case, I added to the CMakeList

list (APPEND PBRT_CXX_FLAGS "-ffp-contract=off")

then I rebuilt and all tests now pass and I can render the scene that was crashing before (essentially, villa-lights-on with mlt set as integrator). This being said, it is odd that the introduction of eg. fmads leads to asserts while rendering, since I believe their floating-point "accuracy" should be even better? I wonder if you can reproduce the scenario by forcing that same option to on or fast on either x86 or Apple Silicon, and if you are interested in debugging it. Hope this helps!

fhools commented 1 year ago

Thank you. So, I think the code might be affected by this clang change: https://releases.llvm.org/14.0.0/tools/clang/docs/ReleaseNotes.html#floating-point-support-in-clang From 14.0.0, ffp-contract is set to on, while before it was effectively off. This change affects ARM platforms more than x86 because, I think, x86 produces the fused ops only when some of the architectural-related optimizations are on - which I am not sure if you turn on by default either. The situation is complicated even more by the fact that GCC still, I believe, sets this to off by default for standard C++ compilation. In any case, I added to the CMakeList

list (APPEND PBRT_CXX_FLAGS "-ffp-contract=off")

then I rebuilt and all tests now pass and I can render the scene that was crashing before (essentially, villa-lights-on with mlt set as integrator). This being said, it is odd that the introduction of eg. fmads leads to asserts while rendering, since I believe their floating-point "accuracy" should be even better? I wonder if you can reproduce the scenario by forcing that same option to on or fast on either x86 or Apple Silicon, and if you are interested in debugging it. Hope this helps!

Thank you that fixed it for me. I'm on a 2022 Macbook Pro M1 Max and had the same error. The error I had was on film.cpp:500 RGBFilm::addSplat()

 CHECK(!L.HasNaNs());
mmp commented 1 year ago

I'm glad that fixed it but there still may be a bug in pbrt; changing that setting might just be masking the actual bug.

Could you please do a debug build and post pbrt's full output on one of those crashes, so that I have a chance of digging in to whether there's a pbrt bug?

trevordblack commented 1 year ago

I haven't dug in yet but I can confirm the problem is reproducible on my machine.

I'm running a 12th gen intel part running ubuntu linux (a 14 inch framework laptop)

trevor@tdb:~/pbrt-v4/build$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  Model name:            12th Gen Intel(R) Core(TM) i7-1260P
    CPU family:          6
    Model:               154
    Thread(s) per core:  2
    Core(s) per socket:  12
    ...

Running through the above, I'll try the fix given above, and see if I can't run a debug build to root-cause.

Where the error I run into is:

[ RUN      ] AnalyticTestScenes/RenderTest.RadianceMatches/54
[ tid 37991 @     0.000s film.cpp:500 ] FATAL Check failed: !L.HasNaNs()
(./pbrt_test                             )  0x0x555a70fde92a - pbrt::PrintStackTrace() + 0x2a
(./pbrt_test                             )  0x0x555a70fdeb22 - pbrt::CheckCallbackScope::Fail() + 0x12
(./pbrt_test                             )  0x0x555a71032f07 - pbrt::LogFatal(pbrt::LogLevel, char const*, int, char const*) + 0x1a7
(./pbrt_test                             )  0x0x555a70e23e5c - void pbrt::LogFatal<char const (&) [13]>(pbrt::LogLevel, char const*, int, char const*, char const (&) [13]) + 0x4c
(./pbrt_test                             )  0x0x555a70e1cadb - (unknown) + 0x33aadb
(./pbrt_test                             )  0x0x555a70fb24d5 - (unknown) + 0x4d04d5
(./pbrt_test                             )  0x0x555a70ce7874 - std::_Function_handler<void (long, long), pbrt::ParallelFor(long, long, std::function<void (long)>)::{lambda(long, long)#1}>::_M_invoke(std::_Any_data const&, long&&, long&&) + 0x34
(./pbrt_test                             )  0x0x555a71063352 - pbrt::ParallelForLoop1D::RunStep(std::unique_lock<std::mutex>*) + 0x92
(./pbrt_test                             )  0x0x555a71062a47 - pbrt::ThreadPool::WorkOrWait(std::unique_lock<std::mutex>*, bool) + 0x67
(./pbrt_test                             )  0x0x555a71062973 - pbrt::ThreadPool::Worker() + 0x63
(/lib/x86_64-linux-gnu/libstdc++.so.6    )  0x0x7f0d4f0dc253 - (unknown) + 0xdc253
(/lib/x86_64-linux-gnu/libc.so.6         )  0x0x7f0d4ec94b43 - (unknown) + 0x94b43
(/lib/x86_64-linux-gnu/libc.so.6         )  0x0x7f0d4ed26a00 - (unknown) + 0x126a00
Rendering failed. Debug with --debugstart 0,0.94294167,0,0,0.7749876,0,0,0.16090861,0,0,0.076150656,0,0,0.5543,0,0,0.35725263,0"

Aborted (core dumped)
trevordblack commented 1 year ago

Ran with a debug build:

[ RUN      ] AnalyticTestScenes/RenderTest.RadianceMatches/54
[ tid 44208 @     0.000s util/sampling.cpp:626 ] FATAL Check failed: bins[offset].p > 0 with bins[offset].p = 0, 0 = 0
(./pbrt_test                             )  0x0x5560fe0575ac - pbrt::PrintStackTrace() + 0x1c
(./pbrt_test                             )  0x0x5560fe057990 - pbrt::CheckCallbackScope::Fail() + 0x10
(./pbrt_test                             )  0x0x5560fe0be98e - pbrt::LogFatal(pbrt::LogLevel, char const*, int, char const*) + 0x17e
(./pbrt_test                             )  0x0x5560fe0f7ed2 - void pbrt::LogFatal<char const (&) [15], char const (&) [2], char const (&) [15], float&, char const (&) [2], int&>(pbrt::LogLevel, char const*, int, char const*, char const (&) [15], char const (&) [2], char const (&) [15], float&, char const (&) [2], int&) + 0x102
(./pbrt_test                             )  0x0x5560fe1126c5 - pbrt::AliasTable::Sample(float, float*, float*) const + 0x1f5
(./pbrt_test                             )  0x0x5560fe00fc2e - (unknown) + 0x914c2e
(./pbrt_test                             )  0x0x5560fe00fb14 - (unknown) + 0x914b14
(./pbrt_test                             )  0x0x5560fe00fab2 - (unknown) + 0x914ab2
(./pbrt_test                             )  0x0x5560fe00f972 - (unknown) + 0x914972
(./pbrt_test                             )  0x0x5560fdc42bc5 - std::function<void (long)>::operator()(long) const + 0x55
(./pbrt_test                             )  0x0x5560fdc42b52 - pbrt::ParallelFor(long, long, std::function<void (long)>)::{lambda(long, long)#1}::operator()(long, long) const + 0x42
(./pbrt_test                             )  0x0x5560fdc42afa - void std::__invoke_impl<void, pbrt::ParallelFor(long, long, std::function<void (long)>)::{lambda(long, long)#1}&, long, long>(std::__invoke_other, pbrt::ParallelFor(long, long, std::function<void (long)>)::{lambda(long, long)#1}&, long&&, long&&) + 0x4a
(./pbrt_test                             )  0x0x5560fdc42a67 - std::enable_if<is_invocable_r_v<void, pbrt::ParallelFor(long, long, std::function<void (long)>)::{lambda(long, long)#1}&, long, long>, void>::type std::__invoke_r<void, pbrt::ParallelFor(long, long, std::function<void (long)>)::{lambda(long, long)#1}&, long, long>(pbrt::ParallelFor(long, long, std::function<void (long)>)::{lambda(long, long)#1}&, long&&, long&&) + 0x47
(./pbrt_test                             )  0x0x5560fdc42947 - std::_Function_handler<void (long, long), pbrt::ParallelFor(long, long, std::function<void (long)>)::{lambda(long, long)#1}>::_M_invoke(std::_Any_data const&, long&&, long&&) + 0x47
(./pbrt_test                             )  0x0x5560fe100f6a - std::function<void (long, long)>::operator()(long, long) const + 0x6a
(./pbrt_test                             )  0x0x5560fe0ffb3b - pbrt::ParallelForLoop1D::RunStep(std::unique_lock<std::mutex>*) + 0x8b
(./pbrt_test                             )  0x0x5560fe0ff104 - pbrt::ThreadPool::WorkOrWait(std::unique_lock<std::mutex>*, bool) + 0xf4
(./pbrt_test                             )  0x0x5560fe0fef95 - pbrt::ThreadPool::Worker() + 0x75
(./pbrt_test                             )  0x0x5560fe102cee - void std::__invoke_impl<void, void (pbrt::ThreadPool::*)(), pbrt::ThreadPool*>(std::__invoke_memfun_deref, void (pbrt::ThreadPool::*&&)(), pbrt::ThreadPool*&&) + 0x6e
(./pbrt_test                             )  0x0x5560fe102c02 - std::__invoke_result<void (pbrt::ThreadPool::*)(), pbrt::ThreadPool*>::type std::__invoke<void (pbrt::ThreadPool::*)(), pbrt::ThreadPool*>(void (pbrt::ThreadPool::*&&)(), pbrt::ThreadPool*&&) + 0x32
(./pbrt_test                             )  0x0x5560fe102bc2 - void std::thread::_Invoker<std::tuple<void (pbrt::ThreadPool::*)(), pbrt::ThreadPool*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) + 0x42
(./pbrt_test                             )  0x0x5560fe102b75 - std::thread::_Invoker<std::tuple<void (pbrt::ThreadPool::*)(), pbrt::ThreadPool*> >::operator()() + 0x15
(./pbrt_test                             )  0x0x5560fe1029d9 - std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (pbrt::ThreadPool::*)(), pbrt::ThreadPool*> > >::_M_run() + 0x19
(/lib/x86_64-linux-gnu/libstdc++.so.6    )  0x0x7fa39eedc253 - (unknown) + 0xdc253
(/lib/x86_64-linux-gnu/libc.so.6         )  0x0x7fa39ea94b43 - (unknown) + 0x94b43
(/lib/x86_64-linux-gnu/libc.so.6         )  0x0x7fa39eb26a00 - (unknown) + 0x126a00
Rendering failed. Debug with --debugstart 1,0"
trevordblack commented 1 year ago

I got as far as MLTIntegrator::Render() calling

        int bootstrapIndex = bootstrapTable.Sample(rng.Uniform<Float>());

to

int AliasTable::Sample(Float u, Float *pmf, Float *uRemapped) const {
    // Compute alias table _offset_ and remapped random sample _up_
    int offset = std::min<int>(u * bins.size(), bins.size() - 1);
    Float up = std::min<Float>(u * bins.size() - offset, OneMinusEpsilon);

    if (up < bins[offset].q) {
        // Return sample for alias table at _offset_
        DCHECK_GT(bins[offset].p, 0);

good luck with the debugging

mmp commented 1 year ago

@winterismute you were right about the ffp-contract stuff, though I'm glad I took the time to chase down the exact problem. In short, given these lines of code:

    int offset = std::min<int>(u * bins.size(), bins.size() - 1);
    Float up = std::min<Float>(u * bins.size() - offset, OneMinusEpsilon);

clang would use an unasked-for FMA instruction that led to up being negative in some cases, even if u was between 0 and 1. That in turn led to the assertion failures. That general form of computation is used a lot in pbrt's sampling code so indeed, the fix was to disable that silly new ffp-contract default...