ldc-developers / ldc

The LLVM-based D Compiler.
http://wiki.dlang.org/LDC
Other
1.21k stars 261 forks source link

FTRBFS on ppc64el #1977

Closed ximion closed 7 years ago

ximion commented 7 years ago

The LDC 1.1.0~beta6 compiler fails to rebuild itself from source on the ppc64el architecture. The initial build of Beta6 with LDC 1.1.0~beta3 worked, but apparently Beta6 segfaults when trying to build any binary on ppc64el.

See https://buildd.debian.org/status/fetch.php?pkg=ldc&arch=ppc64el&ver=1%3A1.1.0%2Bb6-2&stamp=1482261468&raw=0 or https://buildd.debian.org/status/fetch.php?pkg=terminix&arch=ppc64el&ver=1.4.0-1&stamp=1482777536&raw=0

This is currently preventing all packages built with LDC from reaching the Debian release, which is bad, since Debian will be frozen at the beginning of February and it isn't said that we will get an exception for D stuff for a short time yet.

(Unfortunately I can't help much with debugging here :-/)

kinke commented 7 years ago

Beta6 has been out for 1.5 months and we were just about to release 1.1 final (this month) ;)... Anyway, from the first log:

-- Found host D compiler /usr/bin/ldmd2, with default flags '' -- Host D compiler version:

This indicates that not even /usr/bin/ldmd2 --version succeeded, no compilation involved. How was that /usr/bin/ldmd2 generated? Any chance you can run that binary? Or perhaps just some missing .so?

kinke commented 7 years ago

Oh well beta3 didn't have the PPC ABI fixes (https://github.com/ldc-developers/ldc/pull/1905); they were introduced in beta6. The release notes mention that branch ltsmaster or 0.17.2 must be used for bootstrapping on Power.

ximion commented 7 years ago

@markos looks like re-bootstrapping will do the job then...

I wonder why Beta3 was able to (re)build itself properly though... Anyway, that information on when the final 1.1 release will be made is very valuable, we might have a chance to get LDC 1.1 into Debian 9 at the last minute.

kinke commented 7 years ago

Related issues: #1904 and #1909.

The bad thing about ABI issues is that they may only show up when testing after another self-bootstrapping build step.

Fine host D compiler + buggy LDC source => working LDC (but generating incorrect ABI code) + 'arbitrary' source => successful compilation and linking, but code (incl. self-bootstrapped LDC) crashes at runtime.

I'm pretty sure you guys used 0.17.* or ltsmaster for bootstrapping beta3, resulting in a non-crashing executable producing incorrect ABI code. It was able to rebuild itself, but that one then crashes.

ximion commented 7 years ago

@kinke I am wondering whether it makes sense to collaborate with the GDC and maybe DMD developers to establish a system which continuously builds a set of complex applications (e.g. Vibe.d, Terminix/GtkD, AppStream-Generator, the compilers themselves, ...) with the current Git master version of the D compilers on multiple architectures.

I run into bugs incredibly often, just now I've hit another GDC bug (still needs to be reported, can be observed at https://travis-ci.org/ximion/appstream-generator/jobs/193238117 ) and it's not great to catch these bugs when the compilers hit the distributions (Debian having a Beta release of LDC at time is of course an error on our side). Maybe it would make sense to ask the D Foundation for some funding (hardware!) and then maybe have a GSoC or CodeIn student set up a Jenkins instance for building (should not really be a hard task, doesn't warrant a SoC project). That way, we would also know for sure which commit triggered an issue.

As for this bug (which likely can be closed, sorry for the noise): I'll need to nag markos a bit so he can re-bootstrap LDC on ppc64el.

PetarKirov commented 7 years ago

http://dlang.org/blog/2017/01/20/testing-in-the-d-standard-library/ @MartinNowak has setup a project tester CI that builds a couple of popular DUB projects and we use this for dmd, druntime and phobos pull requests. I guess that shouldn't be too hard to extend to LDC and GDC, though the main obstacle for addressing the current issue is that it's x86 only.

ximion commented 7 years ago

@ZombineDev I found most issues when not building with dub, but with Ninja which separates the compile and link steps and makes multiple calls to the compiler. But yeah, I think having this kind of CI for LDC/GDC too and extend it to be available on more architectures would be incredibly helpful.

I can grant people access to arch porterboxes at Debian to fix specific bugs, but unfortunately we can't continuously build other projects on them ^^

Btw, sorry for abusing this bugreport, but is there any known issue that makes x86 compilation segfault with Beta6? I am asking because this just came in: https://buildd.debian.org/status/fetch.php?pkg=terminix&arch=i386&ver=1.4.2-1&stamp=1485300812&file=log

JohanEngelen commented 7 years ago

is there any known issue that makes x86 compilation segfault with Beta6

Stacktrace looks like infinite recursion? Can you spend some time to dustmite it?

ximion commented 7 years ago

Stacktrace looks like infinite recursion? Can you spend some time to dustmite it?

Honest answer is: I don't know... I am very busy with a project at work at time and I barely have time until next week (Friday). So yeah, I might find time to dustmite this, but I can't promise to have it done soon. The software in question (Terminix) is open-source though.

kinke commented 7 years ago

Compiling the Terminix modules with -mtriple=i686-pc-linux-gnu -c using an LDC with enabled assertions and debug infos should hopefully be enough to reproduce and ideally identify the issue. I might be able to try that this evening (CET) if noone managed to do it until then.

kinke commented 7 years ago

I can compile Terminix 1.4.2 fine with beta 6 (LLVM 3.9) on 64-bit Ubuntu 16.04 using dub (the version shipped with beta 6). Linking fails as I don't have/want the X11 libs. I tried 64-bit ldc2 cross-compilation via dub --arch=x86, 32-bit ldc2 and 32-bit ldmd2. The Terminix readme mentions that the autotools support is experimental, and I'm not too keen on testing that as well (but it's apparently used by the Debian job).

ximion commented 7 years ago

@kinke The issue likely only happens when Autotools is used. We can't use dub in Debian (or Arch, or Fedora, ...) for building (see https://gist.github.com/ximion/fe6264481319dd94c8308b1ea4e8207a ))

Building with Automake is a matter of ./autogen.sh && make.

ximion commented 7 years ago

Looks like that crash on i386 was a fluke... A subsequent rebuild worked - weird.

dnadlinger commented 7 years ago

Could that crash have been due to out-of-memory?

ximion commented 7 years ago

Could be the cause, but it's incredibly unlikely - the machine ( https://db.debian.org/machines.cgi?host=x86-grnet-01 ) is well-equipped, and there were no issues reported. Of course, an OOM error could still have happened, but I would rule it out as a cause, since it is really very unlikely.

kinke commented 7 years ago

@ximion @markos: 1.1 has been tagged. Temporary link to the full source incl. submodules: here.

ximion commented 7 years ago

@kinke Nice, but your link doesn't work ^^ If we ship this in Debian for testing, and you change something, we would need a 1.1.1 release afterwards though - so, do you think it's safe to ship?

kinke commented 7 years ago

@ximion: The link worked when I posted it ;) - here's an updated one. 1.1.0 is set in stone with the tag; most packages have already been uploaded, and we're about to announce any day now. I just wanted to give you a heads-up in case the Debian freeze is before we make it to the announcement...

ximion commented 7 years ago

@kinke The last chance for an official migration has already passed on 25.Jan, so we will need to ask for an exception to still add LDC - the chances to get one are quite good though. Unfortunately all depending D stuff needs to be rebuild and we still need to fix the ppc64el issue, which is quite a pain.

kinke commented 7 years ago

Oh, I hope you guys manage, thanks for your efforts anyway!

we still need to fix the ppc64el issue

The bootstrapping issue, right?

ximion commented 7 years ago

The bootstrapping issue, right?

Yes, I hope it works well after re-bootstrapping.

There will be a bug tracking the freeze-exception once we have LDC 1.1 in unstable.

kalev commented 7 years ago

Would it be possible to have a new 0.17.x source release for bootstrapping as well, please? Looking at the ltsmaster git commit log, the ABI fixes might be good to have in a release.

ximion commented 7 years ago

@kinke There must be something weird... I still can't access the LDC tarball... Maybe I'm just not fast enough and Github changes the URL quickly? :P

kalev commented 7 years ago

OK, new LDC successfully bootstrapped in Fedora rawhide for armv7hl, x86_64, ppc64le, ppc64, i686 architectures. Thanks for the release! (And sorry for hijacking an unrelated ticket.)

kinke commented 7 years ago

@kalev: There'll very soon be a 0.17.3 release. Thanks for the packages and feedback!

ximion commented 7 years ago

Debian has the new LDC release too now, and we are trying to get it into the next release. I hope @markos can look into re-bootstrapping ppc64el soon (I don't know how to properly inject packages on our porterboxes, they are very limited in what commands you can execute on them).

@kalev: I am even more guilty of hijacking bugtracker issues for communication purposes :P LDC fortunately has a forum/mailinglist where this stuff can be directed at too: http://forum.dlang.org/group/ldc

Next, I will compile vibe.d again using the Debian packages and see if all the issues are resolved :)

JohanEngelen commented 7 years ago

@kalev : 0.17.3 source release is out.

kalev commented 7 years ago

Excellent, thanks!

kalev commented 7 years ago

@JohanEngelen, the 0.17.3 tarball seems to be missing submodules, any chance you uploaded a wrong file?

JohanEngelen commented 7 years ago

:( @kalev Should be fixed now.

kalev commented 7 years ago

@JohanEngelen Excellent, thanks, that works now.