status-im / nimbus-eth2

Nim implementation of the Ethereum Beacon Chain
https://nimbus.guide
Other
543 stars 233 forks source link

Cannot compile nimbus (devel), hitting internal gcc compiler error #1970

Closed protolambda closed 3 years ago

protolambda commented 4 years ago

Describe the bug

Trying to build nimbus, in preparation of Toledo testnet.

To Reproduce Steps to reproduce the behavior:

Host machine:

OS: Arch Linux 
Kernel: x86_64 Linux 5.8.10-arch1-1

But running gcc from docker

commit: 11ab3a2f4fc381092a2f7a54b4d1c400b6a2cd14 branch: devel

RUN cd /root/nimbus-eth2 \
 && make -j$(nproc) update \
 && make -j$(nproc) LOG_LEVEL="TRACE" NIMFLAGS="-d:insecure" beacon_node \
 && make -j$(nproc) LOG_LEVEL="TRACE" NIMFLAGS="-d:insecure" validator_client

From docker hack here: https://github.com/protolambda/nimbus-docker

Building: build/beacon_node
/root/nimbus-eth2/beacon_chain/eth1_monitor.nim(534, 61) template/generic instantiation of `async` from here
/root/nimbus-eth2/vendor/nim-chronos/chronos/asyncmacro2.nim(240, 37) Warning: Cannot prove that 'result' is initialized. This will become a compile time error in the future. [ProveInit]
during RTL pass: final
???: In function 'route__4JhWafNTfSkgRjXJ7U9bzpw':
???:4:16: internal compiler error: in notice_source_line, at final.c:3237
0x6408a3 notice_source_line
        ../../src/gcc/final.c:3237
0xc667cd final_scan_insn_1
        ../../src/gcc/final.c:2420
0xc67b4b final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
        ../../src/gcc/final.c:3152
0xc67c56 final_1
        ../../src/gcc/final.c:2020
0xc689c6 rest_of_handle_final
        ../../src/gcc/final.c:4658
0xc689c6 execute
        ../../src/gcc/final.c:4736
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
Error: execution of an external compiler program 'gcc -c  -w -pthread -I/root/nimbus-eth2/vendor/nim-libbacktrace -I/root/nimbus-eth2/vendor/nim-libbacktrace/install/usr/include -I/root/nimbus-eth2/vendor/nim-bearssl/bearssl/csources/src -I/root/nimbus-eth2/vendor/nim-bearssl/bearssl/csources/inc -I/root/nimbus-eth2/vendor/nim-bearssl/bearssl/csources/tools -DBR_USE_UNIX_TIME=1 -DBR_USE_URANDOM=1 -DBR_LE_UNALIGNED=1 -DBR_64=1  -DBR_amd64=1 -DBR_INT128=1 -fno-tree-vectorize -I/root/nimbus-eth2/vendor/nim-secp256k1/secp256k1_wrapper -I/root/nimbus-eth2/vendor/nim-secp256k1/secp256k1_wrapper/secp256k1 -I/root/nimbus-eth2/vendor/nim-secp256k1/secp256k1_wrapper/secp256k1/src -DHAVE_CONFIG_H -DHAVE_BUILTIN_EXPECT -std=gnu99 -I/root/nimbus-eth2/vendor/nim-nat-traversal/vendor/miniupnp/miniupnpc -I/root/nimbus-eth2/vendor/nim-nat-traversal/vendor/libnatpmp-upstream -DENABLE_STRNATPMPERR -I/root/nimbus-eth2/vendor/nim-bearssl/bearssl/certs -flto=auto -march=native -g3 -Og -O3 -fno-strict-aliasing -fno-ident -fno-lto  -I/root/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib -I/root/nimbus-eth2/beacon_chain -o /root/nimbus-eth2/nimcache/release/beacon_node/@m..@svendor@snim-json-rpc@sjson_rpc@sserver.nim.c.o /root/nimbus-eth2/nimcache/release/beacon_node/@m..@svendor@snim-json-rpc@sjson_rpc@sserver.nim.c' failed with exit code: 1

make: *** [Makefile:150: beacon_node] Error 1
The command '/bin/bash -c cd /root/nimbus-eth2  && make -j$(nproc) update  && make -j$(nproc) LOG_LEVEL="TRACE" NIMFLAGS="-d:insecure" beacon_node  && make -j$(nproc) LOG_LEVEL="TRACE" NIMFLAGS="-d:insecure" validator_client' returned a non-zero code: 2

Additional context

Let me know if I should be targetting another branch for Toledo testnet

tersec commented 4 years ago

It's because https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97714, which is triggered by Nim incorrectly generating #line 0 preprocessor directives. The two general workarounds right now are using a slightly older gcc version (in Debian, gcc 10.2.0-15 works, but gcc 10.2.0-16 triggers this ICE), and compiling Nimbus without line-level debugging information.

tersec commented 4 years ago

I see https://github.com/protolambda/nimbus-docker/blob/master/Dockerfile bases itself on

FROM debian:bullseye-slim AS build

Which is where I first encountered this issue a few days ago: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=973708

I suggest not targeting Debian bullseye at the moment, as the most expedient approach, or any other Debian-based distro which is on https://packages.debian.org/bullseye/gcc-10 at 10.2.0-16. https://metadata.ftp-master.debian.org/changelogs//main/g/gcc-10/gcc-10_10.2.0-16_changelog states that this regression was introduced on Thu, 29 Oct 2020 16:36:48 +0100, so anything before then is fine in the Debian lineage. For example, Ubuntu 20.04LTS or 20.10 should work fine.

Ideally, gcc upstream fixes this quickly enough so it's largely a non-issue. Otherwise, we'll have to adapt our build process.

tersec commented 3 years ago

https://github.com/nim-lang/Nim/issues/15942 fixes this, so the next Nim update will pick this up.

tersec commented 3 years ago

Fixed at some point in Debian sid, despite no visible progress upstream at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97714. Tested 10.2.0-19 from https://packages.debian.org/sid/gcc-10 and it seems to work. Debian testing (bullseye) might or might not work yet, but at this point will within a couple of weeks, as unstable (sid) migrates to testing (bullseye).

Will still be good to also have the fix from Nim's side, but this resolves the issue as initially opened.

tersec commented 3 years ago

https://metadata.ftp-master.debian.org/changelogs//main/g/gcc-10/gcc-10_10.2.0-19_changelog explains this:

gcc-10 (10.2.0-19) unstable; urgency=medium

  * Update to git 20201125 from the gcc-10 branch.
    - Fix PR target/97730 (AArch64), PR target/97887 (x86), PR d/97889,
      PR d/97843, PR d/97842, PR libstdc++/92546, PR libstdc++/97876,
      PR libstdc++/95989, PR libstdc++/97869, PR c++/97918, PR debug/97060,
      PR target/97534 (ARM), PR c++/96805, PR c++/96199.
  * Configure again with --enable-checking=release.
  * Enable again pgo and lto builds for 64bit architectures.

 -- Matthias Klose <doko@debian.org>  Wed, 25 Nov 2020 09:53:22 +0100

The important part is "Configure again with --enable-checking=release", as from the gcc issue, "it doesn't fail with --enable-checking=release".

stefantalpalaru commented 3 years ago

https://gcc.gnu.org/install/configure.html :

--enable-checking=list ... The categories of checks available in list are ‘yes’ (most common checks ‘assert,misc,gc,gimple,rtlflag,runtime,tree,types’), ‘no’ (no checks at all), ‘all’ (all but ‘valgrind’), ‘release’ (cheapest checks ‘assert,runtime’) or ‘none’ (same as ‘no’). ‘release’ checks are always on and to disable them ‘--disable-checking’ or ‘--enable-checking=no[,]’ must be explicitly requested. Disabling assertions makes the compiler and runtime slightly faster but increases the risk of undetected internal errors causing wrong code to be generated.


That package maintainer was playing with fire.

tersec commented 3 years ago

Sure, there's some gcc bug hidden --enable-checking=yes,extra,rtl is revealing. At least the maintainer could have minimized the difference by finding which of the check categories misc,gc,gimple,rtlflag,tree,types,extra,rtl trigger this, and only disabling them.

It'd also narrow things down for the gcc people and anyone else looking at that bug.