[tracker] AArch64 progress

dnadlinger commented 7 years ago

I'm currently on-off working on aarch64-unknown-linux-gnu and have a large part of the test suite passing already, including std.math (IEEE 754 quad-precision real support). The idea is to get an ltsmaster compiler that passes the whole test suite first. See the log of the latest AArch64 CI builds for the current status of the test suite.

Open items:

[x] C varargs integration, ie core.stdc.stdarg.va_arg support
[x] C++ ABI tests
[x] Full std.conv support for quad-precision reals - dlang/phobos#6633
[x] core.internal.convert support for CTFE conversion of Quadruple reals to a ubyte array - dlang/druntime#2257
- [ ] std.numeric.CustomFloat support for Quadruple reals
- [ ] Add remaining quadruple-precision constants needed by std.internal.math.gammafunction
- [ ] Fix issue when initializing with -NaN that trips some asserts in std.math and std.algorithm.sorting, but only when natively compiling on AArch64

Upstream PRs to integrate into ltsmaster:

[x] dlang/dmd#6857 (2.075.0)
[x] dlang/dmd#6858 (2.075.0)
[x] dlang/phobos#3598
[x] dlang/phobos#5442 (2.075.0)
[x] dlang/phobos#5443
[x] dlang/phobos#5444
[x] dlang/phobos#5445
[x] dlang/phobos#5458
[x] dlang/phobos#5459
[x] dlang/phobos#5947

@redstar: How is your cent work progressing?

redstar commented 7 years ago

@klickverbot I still have some trouble with the 128bit emulation code.

dnadlinger commented 7 years ago

@redstar: What sort of trouble in particular?

egrimley commented 7 years ago

Link to Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=791976

joakim-noah commented 7 years ago

I've been trying out Android/AArch64, now that I have access to an Android/AArch64 tablet, with an updated version of dlang/druntime#1267 for AArch64. One oddity is that llvm doesn't invoke __tls_get_addr on linux/AArch64, which was one of the main reasons native TLS didn't work on Android/ARM, as the Android dynamic linker doesn't provide that function.

As a result, I wonder if native TLS can be made to work on Android/AArch64, though when I tried reverting my llvm patch and other ldc/druntime modifications for Walter's brand of emulated TLS, the resulting test binary would segfault when the GC called some thread-local variable. I didn't look into it further and switched to llvm's emulated TLS instead, which is what clang uses on Android/AArch64, but I'll investigate using native TLS more later.

Using llvm's emulated TLS and cross-compiling the druntime tests, I had to disable these real-related tests in core.internal.{convert,hash}:

diff --git a/src/core/internal/convert.d b/src/core/internal/convert.d
index eeb5580c..a31cbb3e 100644
--- a/src/core/internal/convert.d
+++ b/src/core/internal/convert.d
@@ -348,13 +348,13 @@ version(unittest)
         //testNumberConvert!("-double.nan"); //BUG @@@3632@@@
         testNumberConvert!("double.nan");

-        testNumberConvert!("-real.infinity");
+        /*testNumberConvert!("-real.infinity");
         testNumberConvert!("real.infinity");
         testNumberConvert!("-0.0L");
         testNumberConvert!("0.0L");
         //testNumberConvert!("-real.nan"); //BUG @@@3632@@@
         testNumberConvert!("real.nan");
-
+*/
         /**
             Test min and max values values: min value has an '1' mantissa and minimal exponent,
             Max value has an all '1' bits mantissa and max exponent.
@@ -378,13 +378,13 @@ version(unittest)
         testNumberConvert!("cast(const)3.14");
         testNumberConvert!("cast(immutable)3.14");

-        testNumberConvert!("real.min_normal");
+       /* testNumberConvert!("real.min_normal");
         testNumberConvert!("real.max");
         testNumberConvert!("-0.17L");
         testNumberConvert!("3.14L");
         testNumberConvert!("cast(const)3.14L");
         testNumberConvert!("cast(immutable)3.14L");
-
+*/
         /**Test denormalized values*/

         /**Max denormalized value, first bit is 1*/
@@ -401,22 +401,22 @@ version(unittest)
         testNumberConvert!("double.min_normal/19");
         testNumberConvert!("double.min_normal/17");

-        testNumberConvert!("real.min_normal/2");
+        /*testNumberConvert!("real.min_normal/2");
         testNumberConvert!("real.min_normal/2UL^^63");
         testNumberConvert!("real.min_normal/19");
         testNumberConvert!("real.min_normal/17");
-
+*/
         /**Test imaginary values: convert algorithm is same with real values*/
         testNumberConvert!("0.0Fi");
         testNumberConvert!("0.0i");
-        testNumberConvert!("0.0Li");
+        //testNumberConvert!("0.0Li");

         /**True random values*/
-        testNumberConvert!("-0x9.0f7ee55df77618fp-13829L");
+        /*testNumberConvert!("-0x9.0f7ee55df77618fp-13829L");
         testNumberConvert!("0x7.36e6e2640120d28p+8797L");
         testNumberConvert!("-0x1.05df6ce4702ccf8p+15835L");
         testNumberConvert!("0x9.54bb0d88806f714p-7088L");
-
+*/
         testNumberConvert!("-0x9.0f7ee55df7ffp-338");
         testNumberConvert!("0x7.36e6e264012dp+879");
         testNumberConvert!("-0x1.05df6ce4708ep+658");
diff --git a/src/core/internal/hash.d b/src/core/internal/hash.d
index a27def99..b710f4cc 100644
--- a/src/core/internal/hash.d
+++ b/src/core/internal/hash.d
@@ -289,7 +289,7 @@ unittest
     auto h27 = ptrexpr.hashOf();

     enum h28 = realexpr.hashOf();
-    enum h29 = raexpr.hashOf();
+    //enum h29 = raexpr.hashOf();
     enum h30 = nullexpr.hashOf();

     auto v1 = dexpr;

I thought that might be related to ldc not supporting cross-compiling to the larger 128-bit reals from linux/x64, but I then tried compiling ldc from the ltsmaster branch natively on the tablet and it had the same issue. David, you didn't hit this?

Most of the druntime tests pass with llvm's emulated TLS, only core.thread segfaulted when cross-compiling the tests from ldc master. Several more modules fail when natively compiling ltsmaster, mostly core.sync and the like, indicating TLS issues, though strange that those work when cross-compiling master.

I'm now trying to build the phobos tests, but having issues with compiling std.conv.to. I'll look further into the druntime failures.

dnadlinger commented 7 years ago

Oh, I've hit these, plus varargs are not implement yet.

egrimley commented 7 years ago

Is the problem really "varargs", or just the calling convention (va_list is a struct that gets passed by value)?

dnadlinger commented 7 years ago

I should have been more precise: The problem isn't the passing of varargs (which works just fine), but the implementation of va_arg in core.stdc.stdarg, where the AAPCS64 ABI needs to be "re-implemented" in D code. (Twice, if you are not clever about it.)

kinke commented 7 years ago

And fixing #2150 is needed for proper C/C++ interop with somewhat larger structs passed by value.

dnadlinger commented 7 years ago

I've posted a preliminary patch for the ABI issue at #2150, which applies on top of ltsmaster. I'm not sure I'll find the time to test this for a few more days.

AntonMeep commented 7 years ago

Any updates? I'd like to help with testing on Raspberry Pi 3.

truedat101 commented 7 years ago

@ohdatboi what OS/version are you running on your raspi3? I was surprised to find that raspibian is 32bit os, so by default, most stuff will compile/run just fine there. If you are running something else, ArchLinux or other, would be interesting for testing.

I have a handful of "other" linux/arrch64 platforms that could be used for testing.

dnadlinger commented 7 years ago

I have a handful of "other" linux/arrch64 platforms that could be used for testing.

Thanks, more platforms for testing (especially with different instruction set extensions/…) is always good to have on ARM.

However, the main blocker for steady progress on AArch64 is really just the availability of a CI system. As things are currently, things improve once every while when somebody makes a push, and then slowly regress again. When I worked on AArch64 a while back, most of the test suite passed (with that WIP patch, apart from real formatting and va_arg usage), but the situation is probably quite different right now.

AntonMeep commented 7 years ago

@truedat101 I use ArchLinux Arm, 32-bit version. Indeed, It works like a charm. Currently, 64-bit systems for RPi3 miss GPU firmware. Somebody works on this for Debian, so eventually it'd be done.

truedat101 commented 7 years ago

@klickverbot are you suggesting the project needs an AARCH64 device for the CI server as a build/test node?

kinke commented 7 years ago

Yes, running a 64-bit OS and exposed as CI webservice pluggable into GitHub.

truedat101 commented 7 years ago

@kinke - I can provide that. Who should I contact about setup details? I've got a Pine64 cluster I can dedicate for this purpose.

kinke commented 7 years ago

Oh wow, that'd be awesome. Feel free to drop me a mail at kinke <ät> gmx <döt> net.

truedat101 commented 6 years ago

@kinke sorry got tied up with some CES stuff end of year for work. I'll try to get this box online. I have such a weird setup with my ISP ... hoping DDNS will work.

joakim-noah commented 6 years ago

Alright, finally got back to this and got much farther. With this tiny patch to ldc 1.8 from git master in a linux/x64 VPS to invoke llvm's emulated TLS, I built an ldc against stock llvm that cross-compiles most of the stdlib, after setting CC to the Android NDK's clang and passing -DLDC_TARGET_PRESET=Android-aarch64 to CMake:

diff --git a/driver/codegenerator.cpp b/driver/codegenerator.cpp
index c326ed20..8e4eb554 100644
--- a/driver/codegenerator.cpp
+++ b/driver/codegenerator.cpp
@@ -323,13 +323,13 @@ void CodeGenerator::emit(Module *m) {
     if (global.params.targetTriple->getEnvironment() == llvm::Triple::Android) {
       // On Android, bracket TLS data with the symbols _tlsstart and _tlsend, as
       // done with dmd
-      auto startSymbol = new llvm::GlobalVariable(
+      /*auto startSymbol = new llvm::GlobalVariable(
           ir_->module, llvm::Type::getInt32Ty(ir_->module.getContext()), false,
           llvm::GlobalValue::ExternalLinkage,
           llvm::ConstantInt::get(ir_->module.getContext(), APInt(32, 0)),
           "_tlsstart", &*(ir_->module.global_begin()));
       startSymbol->setSection(".tdata");
-
+*/
       auto endSymbol = new llvm::GlobalVariable(
           ir_->module, llvm::Type::getInt32Ty(ir_->module.getContext()), false,
           llvm::GlobalValue::ExternalLinkage,
diff --git a/driver/targetmachine.cpp b/driver/targetmachine.cpp
index d75a31b2..7f5e6434 100644
--- a/driver/targetmachine.cpp
+++ b/driver/targetmachine.cpp
@@ -457,6 +457,7 @@ createTargetMachine(const std::string targetTriple, const std::string arch,
   llvm::TargetOptions targetOptions = opts::InitTargetOptionsFromCodeGenFlags();
   if (targetOptions.MCOptions.ABIName.empty())
     targetOptions.MCOptions.ABIName = getABI(triple);
+  targetOptions.EmulatedTLS = 1;

   auto ldcFloatABI = floatABI;
   if (ldcFloatABI == FloatABI::Default) {

I don't have a way to pass llvm's emulated TLS to the GC, so the GC will crap out if called, which is why I disable the GC and a few modules that call it directly when building the druntime-test-runner:

diff --git a/src/test_runner.d b/src/test_runner.d
index cdc7b362..1eb218df 100644
--- a/src/test_runner.d
+++ b/src/test_runner.d
@@ -54,6 +54,8 @@ void doTest(ModuleInfo* moduleInfo, ref UnitTestResult ret)
     if (auto fp = moduleInfo.unitTest)
     {
         auto name = moduleInfo.name;
+   import core.memory; GC.disable;
+if (name != "core.thread" && name != "gc.impl.conservative.gc"){
         ++ret.executed;
         try
         {
@@ -73,6 +75,7 @@ void doTest(ModuleInfo* moduleInfo, ref UnitTestResult ret)
                    cast(uint)name.length, name.ptr,
                    cast(uint)msg.length, msg.ptr);
         }
+}
     }
 }

The rest of the tests all pass when running that druntime-test-runner on an Android/AArch64 device, when druntime is patched with my WIP pull for Android/AArch64. I will clean that patch up and put it online later today.

Finally, with a small patch for Phobos, I was able to get the phobos2-test-runner to build and run also:

diff --git a/std/complex.d b/std/complex.d
index 776382292..263295966 100644
--- a/std/complex.d
+++ b/std/complex.d
@@ -853,7 +853,7 @@ Complex!T cos(T)(Complex!T z)  @safe pure nothrow @nogc
     import std.math;
     assert(cos(complex(0.0)) == 1.0);
     assert(cos(complex(1.3L)) == std.math.cos(1.3L));
-    assert(cos(complex(0, 5.2L)) == cosh(5.2L));
+    //assert(cos(complex(0, 5.2L)) == cosh(5.2L));
 }

diff --git a/std/math.d b/std/math.d
index 788e1c20c..04176b15c 100644
--- a/std/math.d
+++ b/std/math.d
@@ -3270,7 +3270,7 @@ float ldexp(float n, int exp) @safe pure nothrow @nogc { return ldexp(cast(real)
         assert(x==-1023);
         assert(ldexp(n, x)==0x1p-1024L);
     }
-    else static assert(false, "Floating point type real not supported");
+    //else static assert(false, "Floating point type real not supported");
 }

 /* workaround Issue 14718, float parsing depends on platform strtold
@@ -4009,7 +4009,7 @@ real hypot(real x, real y) @safe pure nothrow @nogc
     static assert(2*(SQRTMAX/2)*(SQRTMAX/2) <= real.max);

     // Proves that sqrt(real.max) ~~  0.5/sqrt(real.min_normal)
-    static assert(real.min_normal*real.max > 2 && real.min_normal*real.max <= 4);
+    //static assert(real.min_normal*real.max > 2 && real.min_normal*real.max <= 4);

     real u = fabs(x);
     real v = fabs(y);
@@ -5447,8 +5447,8 @@ public:
             auto oldState = getControlState();
             // If exceptions are not supported, we set the bit but read it back as zero
             // https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/fpu/feenablxcpth.c
-            setControlState(oldState | (divByZeroException & EXCEPTION_MASK));
-            bool result = (getControlState() & EXCEPTION_MASK) != 0;
+            setControlState(oldState | (divByZeroException & IeeeFlags.EXCEPTIONS_MASK));
+            bool result = (getControlState() & IeeeFlags.EXCEPTIONS_MASK) != 0;
             setControlState(oldState);
             return result;
         }
diff --git a/std/outbuffer.d b/std/outbuffer.d
index 8a9236f45..f4bf0803e 100644
--- a/std/outbuffer.d
+++ b/std/outbuffer.d
@@ -245,7 +245,7 @@ class OutBuffer
      * Append output of C's vprintf() to internal buffer.
      */

-    void vprintf(string format, va_list args) @trusted nothrow
+    /+void vprintf(string format, va_list args) @trusted nothrow
     {
         import core.stdc.stdio : vsnprintf;
         import core.stdc.stdlib : alloca;
@@ -296,7 +296,7 @@ class OutBuffer
         va_start(ap, format);
         vprintf(format, ap);
         va_end(ap);
-    }
+    }+/

     /**
      * Formats and writes its arguments in text format to the OutBuffer.
@@ -385,8 +385,8 @@ class OutBuffer
     buf.write("hello");
     buf.write(cast(byte) 0x20);
     buf.write("world");
-    buf.printf(" %d", 62665);
-    assert(cmp(buf.toString(), "hello world 62665") == 0);
+    //buf.printf(" %d", 62665);
+    //assert(cmp(buf.toString(), "hello world 62665") == 0);

     buf.clear();
     assert(cmp(buf.toString(), "") == 0);

I also had to apply @jpf91's upstream dlang/phobos@a3d16b958 and dlang/phobos@3a9b0513d, but that's it.

With those patches, including disabling the GC in the test runner as for druntime, only 5 modules assert somewhere in the Phobos tests, std.conv, std.internal.math.{error,gamma}function, std.math, and std.numeric, all likely related to floating-point math. std.file has one test assert on Android 8.0 but not 7.0 or below, because most access to /proc was disabled with Oreo. The std.outbuffer excision is needed because varargs isn't implemented yet.

That marks only 7 Phobos modules that have tests failing on Android/AArch64, everything else passes.

redstar commented 6 years ago

@joakim-noah Great work!

joakim-noah commented 6 years ago

I've done nothing, other than the simple bindings for Bionic/AArch64 that I've now put online at dlang/druntime#2148. Along with my first patch above to comment out some tests in core.internal.{convert,hash}, that's all I needed to build and run the druntime test runner.

This is all on you, David, Johannes, and whoever else chipped in with the related linux/AArch64 port so far, which I had nothing to do with. Now that I know where the port's at- farther along than the ARM port when I used it for Android- I hope to chip in with finishing it up.

joakim-noah commented 6 years ago

Tried out ltsmaster and the stdlib tests natively on Android/AArch64 again: got everything to build by applying the ldc patch above, commenting out building and running gen_gccbuiltins (which complains about missing some llvm library?), applying a slightly modified version of the druntime pull for Bionic/AArch64 I posted, and for Phobos, applying upstream commit a3d16b9 linked above and commenting out that same first static assert in std.math.

Several more modules fail in Phobos, nine to be exact, along with most of those that failed with ldc master. Most seem related to concurrency, such as std.parallelism and std.process, likely related to the core.sync.* test failures I noted last year.

I don't think any more that that's related to TLS or the GC, as they work when cross-compiled from ldc master. The only core.sync module that changed is core.sync.mutex, so something there likely needs to be backported.

That means 89 of 103 Phobos modules pass though- including the two failing asserts in std.complex and std.outbuffer commented out above- so pretty good overall.

joakim-noah commented 6 years ago

Took a look at the math-related failures that hit modules in both ltsmaster and master, got three of those ltsmaster modules to pass when these workarounds were added or tests disabled, showing there's not much left to be done:

diff --git a/std/conv.d b/std/conv.d
index 628e43c2e..329e89d1b 100644
--- a/std/conv.d
+++ b/std/conv.d
@@ -2975,7 +3055,8 @@ unittest
     assert(s2.empty);
     x = *cast(longdouble *)&ld;

-    static if(real.mant_dig == 64)
+    static if (floatTraits!real.realFormat == RealFormat.ieeeExtended ||
+               floatTraits!real.realFormat == RealFormat.ieeeQuadruple)
     {
         version (CRuntime_Microsoft)
             ld1 = 0x1.FFFFFFFFFFFFFFFEp-16382L; // strtold currently mapped to strtod
diff --git a/std/math.d b/std/math.d
index d3d70b3e2..664d614c7 100644
--- a/std/math.d
+++ b/std/math.d
@@ -3175,7 +3175,8 @@ float ldexp(float n, int exp) @safe pure nothrow @nogc { return ldexp(cast(real)

 @safe pure nothrow @nogc unittest
 {
-    static if (floatTraits!(real).realFormat == RealFormat.ieeeExtended)
+    static if (floatTraits!(real).realFormat == RealFormat.ieeeExtended ||
+               floatTraits!(real).realFormat == RealFormat.ieeeQuadruple)
     {
         assert(ldexp(1.0L, -16384) == 0x1p-16384L);
         assert(ldexp(1.0L, -16382) == 0x1p-16382L);
@@ -3445,7 +3446,7 @@ real log(real x) @safe pure nothrow @nogc
 }

 ///
-@safe pure nothrow @nogc unittest
+version(none) @safe pure nothrow @nogc unittest
 {
     assert(log(E) == 1);
 }
@@ -4501,7 +4502,7 @@ long lrint(real x) @trusted pure nothrow @nogc
 }

 ///
-@safe pure nothrow @nogc unittest
+version(none) @safe pure nothrow @nogc unittest
 {
     assert(lrint(4.5) == 4);
     assert(lrint(5.5) == 6);
@@ -4516,7 +4517,7 @@ long lrint(real x) @trusted pure nothrow @nogc

 static if (real.mant_dig >= long.sizeof * 8)
 {
-    @safe pure nothrow @nogc unittest
+    version(none) @safe pure nothrow @nogc unittest
     {
         assert(lrint(long.max - 1.5L) == long.max - 1);
         assert(lrint(long.max - 0.5L) == long.max - 1);
@@ -5977,7 +5978,7 @@ int signbit(X)(X x) @nogc @trusted pure nothrow
 {
     debug (math) printf("math.signbit.unittest\n");
     assert(!signbit(float.nan));
-    assert(signbit(-float.nan));
+    //assert(signbit(-float.nan));
     assert(!signbit(168.1234f));
     assert(signbit(-168.1234f));
     assert(!signbit(0.0f));
@@ -5986,7 +5987,7 @@ int signbit(X)(X x) @nogc @trusted pure nothrow
     assert(!signbit(float.max));

     assert(!signbit(double.nan));
-    assert(signbit(-double.nan));
+    //assert(signbit(-double.nan));
     assert(!signbit(168.1234));
     assert(signbit(-168.1234));
     assert(!signbit(0.0));
@@ -5995,7 +5996,7 @@ int signbit(X)(X x) @nogc @trusted pure nothrow
     assert(!signbit(double.max));

     assert(!signbit(real.nan));
-    assert(signbit(-real.nan));
+    //assert(signbit(-real.nan));
     assert(!signbit(168.1234L));
     assert(signbit(-168.1234L));
     assert(!signbit(0.0L));
diff --git a/std/numeric.d b/std/numeric.d
index 5fb3131b2..0b92b6c2e 100644
--- a/std/numeric.d
+++ b/std/numeric.d
@@ -629,7 +629,7 @@ public:
     }
 }

-unittest
+version(none) unittest
 {
     import std.typetuple;
     alias FPTypes =
@@ -665,7 +665,7 @@ unittest
     }
 }

-unittest
+version(none) unittest
 {
     import std.conv;
     CustomFloat!(5, 10) y = CustomFloat!(5, 10)(0.125);

I'll pick any low-hanging fruit left with these and move on to emulated TLS.

joakim-noah commented 6 years ago

Kai gave me access to his linux/AArch64 VPS that he uses for his ldc buildbot, so I was able to build and run these same tests there too. Pretty much all the druntime tests for ltsmaster pass, except for the aforementioned tests from core.internal.{convert,hash} and core.thread is super-flaky. The latter hangs or errors out most of the time, but once every 10 runs or so it passes.

After applying upstream commit a3d16b9 to std.conv again, I got the phobos tests to build too. Pretty much the same tests assert as on Android/AArch64, except I didn't have to take out any of the nine segfaulting modules mentioned above because they passed now.

I also ran the dmd test suite and logged only 21 failing tests. 10 errored out because core.stdc.stdarg is unimplemented; 10 because they use the -m32 flag and fail out because Kai's pre-built llvm only compiles in the AArch64 backend and doesn't support 32-bit ARM. The remaining cppabi test module passes if I comment out the testvalist() which tests varags, so it looks like ltsmaster is pretty close to passing the dmd testsuite. :smile:

If I build ldc 1.8 natively on linux/AArch64 with that ltsmaster build, it flakes out when building some druntime modules with weird errors, just like when I tried it on Android/AArch64, which I'm guessing are from varargs not working yet. So I built whatever I could and cross-compiled the rest from a linux/x64 VPS, ie some of druntime and most of phobos, as done above for all the stdlib tests with ldc 1.8 for Android. As a result, core.sync.{condition,semaphore} hang, core.thread segfaults, and rt.util.container.treap asserts on the last line.

After applying the std.math and std.outbuffer patches for Phobos pasted above and upstream commits a3d16b9 and 3a9b051, the same seven modules noted for Android fail, plus std.stdio consistently flakes out in a test locking files. This patch gets it down to only three failing modules, std.math and std.internal.math.{error,gamma}function:

diff --git a/std/complex.d b/std/complex.d
index 776382292..8583ca599 100644
--- a/std/complex.d
+++ b/std/complex.d
@@ -853,7 +853,7 @@ Complex!T cos(T)(Complex!T z)  @safe pure nothrow @nogc
     import std.math;
     assert(cos(complex(0.0)) == 1.0);
     assert(cos(complex(1.3L)) == std.math.cos(1.3L));
-    assert(cos(complex(0, 5.2L)) == cosh(5.2L));
+    assert(approxEqual(cos(complex(0, 5.2L)).re, cosh(5.2L), real.epsilon));
 }

diff --git a/std/conv.d b/std/conv.d
index 6277c3258..714369976 100644
--- a/std/conv.d
+++ b/std/conv.d
@@ -3079,7 +3079,7 @@ if (isInputRange!Source && isSomeChar!(ElementType!Source) && !is(Source == enum
     assert(parse!double(str).approxEqual(123.456));
 }

-@safe unittest
+version(none) @safe unittest
 {
     import std.exception;
     import std.math : isNaN, fabs;
diff --git a/std/numeric.d b/std/numeric.d
index dfd7a0bde..cc441845b 100644
--- a/std/numeric.d
+++ b/std/numeric.d
@@ -614,7 +614,7 @@ public:
     }
 }

-@safe unittest
+version(none) @safe unittest
 {
     import std.meta;
     alias FPTypes =
@@ -650,7 +650,7 @@ public:
     }
 }

-@system unittest
+version(none) @system unittest
 {
     // @system due to to!string(CustomFloat)
     import std.conv;
@@ -1464,7 +1464,7 @@ do
     T b = ax > bx ? ax : bx;
     // sequence of declarations suitable for SIMD instructions
     T  v = a * cm1 + b * c;
-    assert(isFinite(v));
+    //assert(isFinite(v));
     R fv = f(v);
     if (isNaN(fv) || fv == -T.infinity)
     {
@@ -1585,7 +1585,7 @@ do
     assert(ret.y.approxEqual(0.0));
 }

-@safe unittest
+version(none) @safe unittest
 {
     import std.meta : AliasSeq;
     foreach (T; AliasSeq!(double, float, real))
diff --git a/std/stdio.d b/std/stdio.d
index 69be91f9f..f60d1c77f 100644
--- a/std/stdio.d
+++ b/std/stdio.d
@@ -1497,7 +1497,7 @@ Removes the lock over the specified file segment.
         g.unlock();
     }

-    version(Posix)
+    version(none)
     @system unittest
     {
         static import std.file;

This same patch, except for the unneeded std.stdio excision, gets it down to those three failing modules with ldc 1.8 cross-compiling for Android/AArch64 too.

kinke commented 6 years ago

Thanks for the overview. I'm surprised it gets this far without varargs support; I remember that as deep-impact obstacle for MSVC support back in the days.

dnadlinger commented 6 years ago

@kinke: Emission of varargs (mostly) works, it's just the va_arg D implementation that is broken.

joakim-noah commented 6 years ago

With the following patch to Phobos on the ltsmaster branch in Kai's linux/AArch64 VPS and the std.conv patch from upstream, I got all the modules to pass but std.stream (which requires core.stdc.stdarg and has since been removed from the stdlib):

diff --git a/std/complex.d b/std/complex.d
index 79c2c3e..69e77ae 100644
--- a/std/complex.d
+++ b/std/complex.d
@@ -777,7 +777,7 @@ unittest{
     import std.complex;
     assert(cos(complex(0.0)) == 1.0);
     assert(cos(complex(1.3L)) == std.math.cos(1.3L));
-    assert(cos(complex(0, 5.2L)) == cosh(5.2L));
+    assert(approxEqual(cos(complex(0, 5.2L)).re, cosh(5.2L), real.epsilon));
 }

diff --git a/std/internal/math/gammafunction.d b/std/internal/math/gammafunction.d
index 9bd8004..edbd89c 100644
--- a/std/internal/math/gammafunction.d
+++ b/std/internal/math/gammafunction.d
@@ -369,7 +369,7 @@ unittest {
     for (int i=1; fact<real.max; ++i) {
         // Require exact equality for small factorials
         if (i<14) assert(gamma(i*1.0L) == fact);
-        assert(feqrel(gamma(i*1.0L), fact) >= real.mant_dig-15);
+        assert(feqrel(gamma(i*1.0L), fact) >= real.mant_dig-47);
         fact *= (i*1.0L);
     }
     assert(gamma(0.0) == real.infinity);
@@ -388,14 +388,14 @@ unittest {
     real SQRT_PI = 1.77245385090551602729816748334114518279754945612238L;

-    assert(feqrel(gamma(0.5L), SQRT_PI) >= real.mant_dig-1);
-    assert(feqrel(gamma(17.25L), 4.224986665692703551570937158682064589938e13L) >= real.mant_dig-4);
+    assert(feqrel(gamma(0.5L), SQRT_PI) >= real.mant_dig-47);
+    assert(feqrel(gamma(17.25L), 4.224986665692703551570937158682064589938e13L) >= real.mant_dig-47);

-    assert(feqrel(gamma(1.0 / 3.0L),  2.67893853470774763365569294097467764412868937795730L) >= real.mant_dig-2);
+    assert(feqrel(gamma(1.0 / 3.0L),  2.67893853470774763365569294097467764412868937795730L) >= real.mant_dig-47);
     assert(feqrel(gamma(0.25L),
-        3.62560990822190831193068515586767200299516768288006L) >= real.mant_dig-1);
+        3.62560990822190831193068515586767200299516768288006L) >= real.mant_dig-48);
     assert(feqrel(gamma(1.0 / 5.0L),
-        4.59084371199880305320475827592915200343410999829340L) >= real.mant_dig-1);
+        4.59084371199880305320475827592915200343410999829340L) >= real.mant_dig-48);
 }

 /*****************************************************
@@ -534,21 +534,21 @@ unittest {
     ];
    // TODO: test derivatives as well.
     for (int i=0; i<testpoints.length; i+=3) {
-        assert( feqrel(logGamma(testpoints[i]), testpoints[i+1]) > real.mant_dig-5);
+        assert( feqrel(logGamma(testpoints[i]), testpoints[i+1]) > real.mant_dig-51);
         if (testpoints[i]<MAXGAMMA) {
-            assert( feqrel(log(fabs(gamma(testpoints[i]))), testpoints[i+1]) > real.mant_dig-5);
+            assert( feqrel(log(fabs(gamma(testpoints[i]))), testpoints[i+1]) > real.mant_dig-53);
         }
     }
     version (DragonFlyBSD) { // FIXME: DragonFlyBSD: rounding differences between logGamma() and log() (ie:llvm_log())
         assert(feqrel(logGamma(-50.2),log(fabs(gamma(-50.2)))) > real.mant_dig-2);
         assert(feqrel(logGamma(-0.008),log(fabs(gamma(-0.008)))) > real.mant_dig-2);
     } else {
-        assert(logGamma(-50.2) == log(fabs(gamma(-50.2))));
+        //assert(logGamma(-50.2) == log(fabs(gamma(-50.2))));
         assert(logGamma(-0.008) == log(fabs(gamma(-0.008))));
     }
-    assert(feqrel(logGamma(-38.8),log(fabs(gamma(-38.8)))) > real.mant_dig-4);
+    assert(feqrel(logGamma(-38.8),log(fabs(gamma(-38.8)))) > real.mant_dig-47);
     static if (real.mant_dig >= 64) // incl. 80-bit reals
-        assert(feqrel(logGamma(1500.0L),log(gamma(1500.0L))) > real.mant_dig-2);
+        assert(feqrel(logGamma(1500.0L),log(gamma(1500.0L))) > real.mant_dig-47);
     else static if (real.mant_dig >= 53) // incl. 64-bit reals
         assert(feqrel(logGamma(150.0L),log(gamma(150.0L))) > real.mant_dig-2);
 }
@@ -956,18 +956,18 @@ unittest { // also tested by the normal distribution

     // Test against Mathematica   betaRegularized[z,a,b]
     // These arbitrary points are chosen to give good code coverage.
-    assert(feqrel(betaIncomplete(8, 10, 0.2), 0.010_934_315_234_099_2L) >=  real.mant_dig - 5);
-    assert(feqrel(betaIncomplete(2, 2.5, 0.9), 0.989_722_597_604_452_767_171_003_59L) >= real.mant_dig - 1);
+    assert(feqrel(betaIncomplete(8, 10, 0.2), 0.010_934_315_234_099_2L) >=  real.mant_dig - 47);
+    assert(feqrel(betaIncomplete(2, 2.5, 0.9), 0.989_722_597_604_452_767_171_003_59L) >= real.mant_dig - 47);
     static if (real.mant_dig >= 64) // incl. 80-bit reals
-        assert(feqrel(betaIncomplete(1000, 800, 0.5), 1.179140859734704555102808541457164E-06L) >= real.mant_dig - 13);
+        assert(feqrel(betaIncomplete(1000, 800, 0.5), 1.179140859734704555102808541457164E-06L) >= real.mant_dig - 48);
     else
         assert(feqrel(betaIncomplete(1000, 800, 0.5), 1.179140859734704555102808541457164E-06L) >= real.mant_dig - 14);
-    assert(feqrel(betaIncomplete(0.0001, 10000, 0.0001), 0.999978059362107134278786L) >= real.mant_dig - 18);
+    assert(feqrel(betaIncomplete(0.0001, 10000, 0.0001), 0.999978059362107134278786L) >= real.mant_dig - 48);
     assert(betaIncomplete(0.01, 327726.7, 0.545113) == 1.0);
-    assert(feqrel(betaIncompleteInv(8, 10, 0.010_934_315_234_099_2L), 0.2L) >= real.mant_dig - 2);
-    assert(feqrel(betaIncomplete(0.01, 498.437, 0.0121433), 0.99999664562033077636065L) >= real.mant_dig - 1);
-    assert(feqrel(betaIncompleteInv(5, 10, 0.2000002972865658842), 0.229121208190918L) >= real.mant_dig - 3);
-    assert(feqrel(betaIncompleteInv(4, 7, 0.8000002209179505L), 0.483657360076904L) >= real.mant_dig - 3);
+    assert(feqrel(betaIncompleteInv(8, 10, 0.010_934_315_234_099_2L), 0.2L) >= real.mant_dig - 47);
+    assert(feqrel(betaIncomplete(0.01, 498.437, 0.0121433), 0.99999664562033077636065L) >= real.mant_dig - 47);
+    assert(feqrel(betaIncompleteInv(5, 10, 0.2000002972865658842), 0.229121208190918L) >= real.mant_dig - 49);
+    assert(feqrel(betaIncompleteInv(4, 7, 0.8000002209179505L), 0.483657360076904L) >= real.mant_dig - 50);

     // Coverage tests. I don't have correct values for these tests, but
     // these values cover most of the code, so they are useful for
@@ -1577,10 +1577,10 @@ done:
 unittest {
     // Exact values
     assert(digamma(1.0)== -EULERGAMMA);
-    assert(feqrel(digamma(0.25), -PI/2 - 3* LN2 - EULERGAMMA) >= real.mant_dig-7);
-    assert(feqrel(digamma(1.0L/6), -PI/2 *sqrt(3.0L) - 2* LN2 -1.5*log(3.0L) - EULERGAMMA) >= real.mant_dig-7);
+    assert(feqrel(digamma(0.25), -PI/2 - 3* LN2 - EULERGAMMA) >= real.mant_dig-56);
+    assert(feqrel(digamma(1.0L/6), -PI/2 *sqrt(3.0L) - 2* LN2 -1.5*log(3.0L) - EULERGAMMA) >= real.mant_dig-56);
     assert(digamma(-5.0).isNaN());
-    assert(feqrel(digamma(2.5), -EULERGAMMA - 2*LN2 + 2.0 + 2.0L/3) >= real.mant_dig-9);
+    assert(feqrel(digamma(2.5), -EULERGAMMA - 2*LN2 + 2.0 + 2.0L/3) >= real.mant_dig-58);
     assert(isIdentical(digamma(NaN(0xABC)), NaN(0xABC)));

     for (int k=1; k<40; ++k) {
@@ -1588,7 +1588,7 @@ unittest {
         for (int u=k; u>=1; --u) {
             y += 1.0L/u;
         }
-        assert(feqrel(digamma(k+1.0), -EULERGAMMA + y) >= real.mant_dig-2);
+        assert(feqrel(digamma(k+1.0), -EULERGAMMA + y) >= real.mant_dig-49);
     }
 }

diff --git a/std/math.d b/std/math.d
index d3d70b3..4b9c753 100644
--- a/std/math.d
+++ b/std/math.d
@@ -3175,7 +3175,8 @@ float ldexp(float n, int exp) @safe pure nothrow @nogc { return ldexp(cast(real)

 @safe pure nothrow @nogc unittest
 {
-    static if (floatTraits!(real).realFormat == RealFormat.ieeeExtended)
+    static if (floatTraits!(real).realFormat == RealFormat.ieeeExtended ||
+               floatTraits!(real).realFormat == RealFormat.ieeeQuadruple)
     {
         assert(ldexp(1.0L, -16384) == 0x1p-16384L);
         assert(ldexp(1.0L, -16382) == 0x1p-16382L);
@@ -4476,6 +4477,7 @@ long lrint(real x) @trusted pure nothrow @nogc
             const j = sign ? -OF : OF;
             x = (j + x) - j;

+            const exp = (vu[F.EXPPOS_SHORT] & F.EXPMASK) - (F.EXPBIAS + 1);
             const implicitOne = 1UL << 48;
             auto vl = cast(ulong*)(&x);
             vl[MANTISSA_MSB] &= implicitOne - 1;
@@ -4483,7 +4485,6 @@ long lrint(real x) @trusted pure nothrow @nogc

             long result;

-            const exp = (vu[F.EXPPOS_SHORT] & F.EXPMASK) - (F.EXPBIAS + 1);
             if (exp < 0)
                 result = 0;
             else if (exp <= 48)
diff --git a/std/numeric.d b/std/numeric.d
index 5fb3131..e46282b 100644
--- a/std/numeric.d
+++ b/std/numeric.d
@@ -103,6 +103,8 @@ private int bsr64(ulong value)

 private template CustomFloatParams(uint bits)
 {
+    version(AArch64) enum CustomFloatFlags flags = CustomFloatFlags.ieee;
+    else
     enum CustomFloatFlags flags = CustomFloatFlags.ieee
                 ^ ((bits == 80) ? CustomFloatFlags.storeNormalized : CustomFloatFlags.none);
     static if (bits ==  8) alias CustomFloatParams = CustomFloatParams!( 4,  3, flags);
@@ -516,6 +518,13 @@ public:
         if (__traits(compiles, cast(real)input))
     {
         import std.conv: text;
+        version(AArch64)
+        { if(is(Unqual!F == real)){
+            ulong* adjust = cast(ulong*) &input;
+            *adjust >>= 48;
+            *(adjust+1) >>= 48;
+            }
+   }
         static if (staticIndexOf!(Unqual!F, float, double, real) >= 0)
             auto value = ToBinary!(Unqual!F)(input);
         else
@@ -566,7 +575,18 @@ public:
         assert(sig <= result.significand_max, text("get significand too large: ",sig," > ",result.significand_max) );
         result.exponent     = cast(result.get.T_exp) exp;
         result.significand  = cast(result.get.T_sig) sig;
-        return result.set;
+        version(AArch64)
+        { if(is(Unqual!F == real)){
+            ulong* adjust = cast(ulong*) &result.set;
+            *adjust <<= 48;
+            *(adjust+1) <<= 48;
+            real* foo = cast (real*)adjust;
+            return *foo;}
+   else
+            return result.set;
+        }
+        else
+            return result.set;
     }

     ///ditto
diff --git a/std/outbuffer.d b/std/outbuffer.d
index a88a92a..6aea8e3 100644
--- a/std/outbuffer.d
+++ b/std/outbuffer.d
@@ -409,7 +409,7 @@ unittest
     buf.printf(" %d", 6);
     //auto s = buf.toString();
     //printf("buf = '%.*s'\n", s.length, s.ptr);
-    assert(cmp(buf.toString(), "hello world 6") == 0);
+    //assert(cmp(buf.toString(), "hello world 6") == 0);
 }

 unittest
diff --git a/std/stdio.d b/std/stdio.d
index a87dec4..9534513 100644
--- a/std/stdio.d
+++ b/std/stdio.d
@@ -1235,7 +1235,7 @@ Removes the lock over the specified file segment.
         g.unlock();
     }

-    version(Posix)
+    version(rosix)
     unittest
     {
         static import std.file;

Strangely, that std.complex test exactly matches when cross-compiled as part of ldc 1.8/1.9's tests from a linux/x64 VPS. Several of the tests from std.internal.math.gammafunction had to be loosened up to 64-bit mantissas, as that module has not been ported to 128-bit floating-point math, but as you can see, it pretty much works at that lower precision. That std.math.lrint issue took me a while to catch, as it was generating nonsense codegen before, see if you can spot the bug before you look at the fix.

Several other std.math tests fail when cross-compiling from linux/x64 with 1.8/1.9, I guess this is because of CTFE only going up to 80-bit math? How does cent tie into this?

That std.numeric fix is just a hack that happens to work because the tests all use a significand of zero, but it basically shows the problem, which is that CustomFloat only supports up to 80-bit precision and tries to store it in a union with the real type, so it puts it in the bottom 80 bits and needs to be shifted 48 bits to be 128-bit. If ported, it'll need to be properly done by storing the significand in multiple words and maybe using BigInt to check the maximum.

The std.outbuffer issue is because of varargs, and that std.stdio test consistently fails on linux/AArch64 for some reason, instead of only when all the other tests are run first on other platforms. I'll look into that more.

Finally, I initially mentioned an issue with signbit and negative NaNs a month ago, that only hits on Android/AArch64 when natively compiled. The IR shows that it puts out regular NaN, not the negative one, but only natively on that platform, ie it works when cross-compiled with ldc 1.8/1.9. Anyone know where that might be going wrong in ltsmaster, in constant folding?

joakim-noah commented 6 years ago

Some good news: I slightly updated Dan's patch to call a __tls_get_address function for iOS/AArch64, smolt/llvm@a7dac8a, to compile with our tweaked llvm 6.0 and used it for Android/AArch64 (I had no idea this patch existed until someone on the NG recently said they had problems building his llvm so I took a look), along with another tweak to disable native TLS for AArch64.

diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp b/lib/Target/AArch64/AArch64ISelLowering.cpp
index 41ed24c3..3d37bd6d 100644
--- a/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -3876,12 +3876,29 @@ AArch64TargetLowering::LowerELFGlobalTLSAddress(SDValue Op,
   assert(Subtarget->isTargetELF() && "This function expects an ELF target");
   assert(Subtarget->useSmallAddressing() &&
          "ELF TLS only supported in small memory model");
+    SDLoc DL(Op);
+    SDValue Result = LowerGlobalAddress(Op, DAG);
+    SDValue Chain = DAG.getEntryNode();
+    ArgListTy Args;
+    ArgListEntry Entry;
+    Type* Ty =(Type*)Type::getInt64Ty(*DAG.getContext());
+    Entry.Node = Result;
+    Entry.Ty = Ty;
+    Args.push_back(Entry);
+
+    // copied, modified from ARMTargetLowering::LowerToTLSGeneralDynamicModel
+    TargetLowering::CallLoweringInfo CLI(DAG);
+    CLI.setDebugLoc(DL).setChain(Chain)
+      .setLibCallee(CallingConv::C, Ty,
+                 DAG.getExternalSymbol("__tls_get_addr", getPointerTy(DAG.getDataLayout())), std::move(Args));
+    std::pair<SDValue, SDValue> CallResult = LowerCallTo(CLI);
+    return CallResult.first;
   // Different choices can be made for the maximum size of the TLS area for a
   // module. For the small address model, the default TLS size is 16MiB and the
   // maximum TLS size is 4GiB.
   // FIXME: add -mtls-size command line option and make it control the 16MiB
   // vs. 4GiB code sequence generation.
-  const GlobalAddressSDNode *GA = cast<GlobalAddressSDNode>(Op);
+/*  const GlobalAddressSDNode *GA = cast<GlobalAddressSDNode>(Op);

   TLSModel::Model Model = getTargetMachine().getTLSModel(GA->getGlobal());

@@ -3966,7 +3983,7 @@ AArch64TargetLowering::LowerELFGlobalTLSAddress(SDValue Op,
     llvm_unreachable("Unsupported ELF TLS access model");

   return DAG.getNode(ISD::ADD, DL, PtrVT, ThreadBase, TPOff);
-}
+*/}

 SDValue AArch64TargetLowering::LowerGlobalTLSAddress(SDValue Op,
                                                      SelectionDAG &DAG) const {
diff --git a/lib/Target/AArch64/MCTargetDesc/AArch64MCExpr.cpp b/lib/Target/AArch64/MCTargetDesc/AArch64MCExpr.cpp
index f606d272..a5319e2d 100644
--- a/lib/Target/AArch64/MCTargetDesc/AArch64MCExpr.cpp
+++ b/lib/Target/AArch64/MCTargetDesc/AArch64MCExpr.cpp
@@ -120,7 +120,7 @@ static void fixELFSymbolsInTLSFixupsImpl(const MCExpr *Expr, MCAssembler &Asm) {
     // We're known to be under a TLS fixup, so any symbol should be
     // modified. There should be only one.
     const MCSymbolRefExpr &SymRef = *cast<MCSymbolRefExpr>(Expr);
-    cast<MCSymbolELF>(SymRef.getSymbol()).setType(ELF::STT_TLS);
+    //cast<MCSymbolELF>(SymRef.getSymbol()).setType(ELF::STT_TLS);
     break;
   }

With this patch, all the druntime tests now pass when cross-compiled with ldc 1.10 (except for the CTFE-related core.internal.* tests listed 10 months ago which don't compile) for Android/AArch64, with the GC enabled and working fine and no problems with core.thread. There's surprisingly no issues when running the core.thread tests, despite the flakiness mentioned above for linux/AArch64. The issues with several threading-related modules mentioned above for ltsmaster compiled natively on my Android/AArch64 smartphone persist, will have to dive into what differs there from master, but otherwise I think the GC is now working there too.

The cross-compiled phobos tests for 1.10 now segfault consistently for object monitors in _d_monitorenter only when the GC's enabled, debugging that next.

joakim-noah commented 6 years ago

Took a look at those threading-related issues with ltsmaster, turns out that some basic pthread types are different on 64-bit Android and that was causing the problems. All ltsmaster druntime tests pass in debug and release mode when compiled natively on Android/AArch64 with this patch (except the few core.internal.* real tests listed above), including the emulated TLS function needed by the above LLVM patch and added Android/AArch64 to the fiber migration list:

diff --git a/src/core/sys/posix/sys/types.d b/src/core/sys/posix/sys/types.d
index 73ccdc7b..8f737a35 100644
--- a/src/core/sys/posix/sys/types.d
+++ b/src/core/sys/posix/sys/types.d
@@ -795,7 +795,7 @@ else version( CRuntime_Bionic )

     struct pthread_cond_t
     {
-        int value; //volatile
+        int[12] value; //volatile
     }

     alias c_long pthread_condattr_t;
@@ -803,7 +803,7 @@ else version( CRuntime_Bionic )

     struct pthread_mutex_t
     {
-        int value; //volatile
+        int[10] value; //volatile
     }

     alias c_long pthread_mutexattr_t;
@@ -811,16 +811,16 @@ else version( CRuntime_Bionic )

     struct pthread_rwlock_t
     {
-        pthread_mutex_t  lock;
+        int[14] value;/*pthread_mutex_t  lock;
         pthread_cond_t   cond;
         int              numLocks;
         int              writerThreadId;
         int              pendingReaders;
         int              pendingWriters;
-        void*[4]         reserved;
+        void*[4]         reserved;*/
     }

-    alias int    pthread_rwlockattr_t;
+    alias c_long    pthread_rwlockattr_t;
     alias c_long pthread_t;
 }
 else
diff --git a/src/core/thread.d b/src/core/thread.d
index 666de00b..17e53c52 100644
--- a/src/core/thread.d
+++ b/src/core/thread.d
@@ -3937,7 +3937,8 @@ version( LDC )

     version( Android )
     {
-        version( ARM ) version = CheckFiberMigration;
+        version( ARM )     version = CheckFiberMigration;
+        version( AArch64 ) version = CheckFiberMigration;
     }
 }

diff --git a/src/rt/sections_android.d b/src/rt/sections_android.d
index 3cd434e0..a7a4ab8b 100644
--- a/src/rt/sections_android.d
+++ b/src/rt/sections_android.d
@@ -127,6 +127,17 @@ else version(ARM)
         return tls.ptr + offset;
     }
 }
+else version(AArch64)
+{
+    extern(C) void* __tls_get_addr( void* p )
+    {
+        debug(PRINTF) printf("  __tls_get_addr input - %p\n", p);
+        immutable offset = cast(size_t)(p - cast(void*)&_tlsstart);
+        auto tls = getTLSBlockAlloc();
+        assert(offset < tls.length);
+        return tls.ptr + offset;
+    }
+}
 else
     static assert( false, "Android architecture not supported." );

With the ltsmaster phobos patch pasted above a couple months ago, 59 of 85 phobos modules pass their tests in release mode with the GC turned on, as opposed to all except std.stream with the GC turned off, looks like GC issues remain.

Noticed that the cross-compiled druntime tests from ldc 1.10 hang in debug mode because of a tripped assert in the GC, even though no problems in release mode. Will look into the cross-compiled phobos 1.10 object monitor issue mentioned before next.

kinke commented 6 years ago

Joakim, have you ever played with targetOptions.EmulatedTLS = true in our createTargetMachine()? According to old https://reviews.llvm.org/D10524, the primary motivation was Android. Also see https://bugs.llvm.org/show_bug.cgi?id=23566.

joakim-noah commented 6 years ago

Yep, see comment above and forum thread, the issue is that you can't pass that TLS data to the GC.

kinke commented 6 years ago

Ah, thx. Is that 200-lines ~~cpp~~ C file the whole required emutls 'lib'? Couldn't we just include a patched copy in druntime, making sure the GC is up-to-date (adding/removing separate roots per emutls_{alloc,destroy} call)?

joakim-noah commented 6 years ago

Yep, that's it. Including a patched version would be a variation of option 2 in that forum thread, but it's problematic for licensing reasons, as those files are either under the GPLv3 with a runtime linking exception or strangely in the case of compiler-rt, either the UIUC or MIT licenses, both of which require attribution, ironically making the libgcc version require less work for the developer who links it.

Googling it now, apparently this was a mistake on the part of the LLVM people, which they're trying to remedy by re-licensing with the Apache license with a runtime linking exception. In any case, since neither version is Boost licensed, I mentioned writing our own, then dismissed that for the easier route I took. Of course, ldc still has this problem for anyone using compiler-rt now, but it's unlikely that any of the compiler-rt authors would ever enforce binary attribution in practice.

kinke commented 6 years ago

I mentioned writing our own

Writing one from scratch in D seems feasible indeed, either using the GC alloc/free functions directly, or if there are initialization issues (TLS vars accessed before GC initialized), keeping around a linked list (or so) of the live TLS vars to be iterated over by the GC.

then dismissed that for the easier route I took

Easier as in already available, but if we have to patch LLVM for every new target without TLS support (AArch64-Android now) and maintain those patches, the alternative in form of a pretty simple, slow but generic emulated TLS support in druntime seems more appealing to me, also because it'd be working with vanilla LLVM for homebrewers and distros.

joakim-noah commented 6 years ago

I hate dealing with this emulated TLS stuff, so the choice between those two was clear for me. The llvm patches for Android are fairly small, and correct me if I'm wrong, don't require much adapting when updating llvm. As for new targets, Android is going in the opposite direction, recently dropping support for ARMv5, MIPS, and MIPS64. I may someday patch x64 also- only because Chromebooks largely use it for their Android support, it's dead on Android itself- but we've got 3/4 Android targets covered with this small patch. Stock llvm would be unsupported, not sure that's a big deal.

kinke commented 6 years ago

As for new targets, Android is going in the opposite direction

I wish they took the TLS-direction soon.

I have no idea whether to expect more non-TLS targets anytime soon. The existing ARM-Android patch seems fine; if the AArch64 one turns out okay, that's fine too. If the GC issues are related and analyzing/fixing those turns out non-trivial, we could at least think about going the other route, as it may be less work/tricky.

joakim-noah commented 6 years ago

Digging into the debugger traces for those GC issues now, will let you know. I doubt they're TLS-related, as I noted most druntime tests pass, but it's possible Phobos stresses the GC more and so is hitting some TLS codegen bug that druntime doesn't. I simply want to rule that out before submitting a pull. I'll also note that Dan used basically the same llvm TLS patch for iOS/AArch64 and got Phobos working with the GC.

joakim-noah commented 6 years ago

Alright GC issues fixed, turned out to be a subtle initialization issue on 64-bit Android. Nothing to do with TLS, will submit that llvm pull soon.

joakim-noah commented 6 years ago

@kinke, cleaned up the llvm patch above and made it only apply to Android, please squash it into the existing Android commit, should be easiest that way:

diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp b/lib/Target/AArch64/AArch64ISelLowering.cpp
index 233d6be2..f9b5db12 100644
--- a/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -3968,6 +3968,31 @@ AArch64TargetLowering::LowerELFGlobalTLSAddress(SDValue Op,
   return DAG.getNode(ISD::ADD, DL, PtrVT, ThreadBase, TPOff);
 }

+SDValue
+AArch64TargetLowering::LowerAndroidGlobalTLSAddress(SDValue Op,
+                                                    SelectionDAG &DAG) const {
+  assert(Subtarget->isTargetELF() && "This function expects an ELF target");
+  SDLoc DL(Op);
+  SDValue Result = LowerGlobalAddress(Op, DAG);
+  SDValue Chain = DAG.getEntryNode();
+  ArgListTy Args;
+  ArgListEntry Entry;
+  Type *Ty = (Type *)Type::getInt64Ty(*DAG.getContext());
+  Entry.Node = Result;
+  Entry.Ty = Ty;
+  Args.push_back(Entry);
+
+  // copied, modified from ARMTargetLowering::LowerToTLSGeneralDynamicModel
+  TargetLowering::CallLoweringInfo CLI(DAG);
+  CLI.setDebugLoc(DL).setChain(Chain).setLibCallee(
+      CallingConv::C, Ty,
+      DAG.getExternalSymbol("__tls_get_addr",
+                            getPointerTy(DAG.getDataLayout())),
+      std::move(Args));
+  std::pair<SDValue, SDValue> CallResult = LowerCallTo(CLI);
+  return CallResult.first;
+}
+
 SDValue AArch64TargetLowering::LowerGlobalTLSAddress(SDValue Op,
                                                      SelectionDAG &DAG) const {
   const GlobalAddressSDNode *GA = cast<GlobalAddressSDNode>(Op);
@@ -3976,8 +4001,12 @@ SDValue AArch64TargetLowering::LowerGlobalTLSAddress(SDValue Op,

   if (Subtarget->isTargetDarwin())
     return LowerDarwinGlobalTLSAddress(Op, DAG);
-  if (Subtarget->isTargetELF())
-    return LowerELFGlobalTLSAddress(Op, DAG);
+  if (Subtarget->isTargetELF()) {
+    if (Subtarget->isTargetAndroid())
+      return LowerAndroidGlobalTLSAddress(Op, DAG); // LDC
+    else
+      return LowerELFGlobalTLSAddress(Op, DAG);
+  }

   llvm_unreachable("Unexpected platform trying to use TLS");
 }
diff --git a/lib/Target/AArch64/AArch64ISelLowering.h b/lib/Target/AArch64/AArch64ISelLowering.h
index 8d78b5b6..31e12ed9 100644
--- a/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/lib/Target/AArch64/AArch64ISelLowering.h
@@ -547,6 +547,7 @@ private:
   SDValue getAddr(NodeTy *N, SelectionDAG &DAG, unsigned Flags = 0) const;
   SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const;
+  SDValue LowerAndroidGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const; // LDC
   SDValue LowerDarwinGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerELFGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerELFTLSDescCallSeq(SDValue SymAddr, const SDLoc &DL,
diff --git a/lib/Target/AArch64/MCTargetDesc/AArch64MCExpr.cpp b/lib/Target/AArch64/MCTargetDesc/AArch64MCExpr.cpp
index f606d272..6571a312 100644
--- a/lib/Target/AArch64/MCTargetDesc/AArch64MCExpr.cpp
+++ b/lib/Target/AArch64/MCTargetDesc/AArch64MCExpr.cpp
@@ -13,7 +13,9 @@
 //===----------------------------------------------------------------------===//

 #include "AArch64MCExpr.h"
+#include "llvm/MC/MCAssembler.h" // LDC
 #include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCObjectFileInfo.h" // LDC
 #include "llvm/MC/MCStreamer.h"
 #include "llvm/MC/MCSymbolELF.h"
 #include "llvm/MC/MCValue.h"
@@ -120,7 +122,12 @@ static void fixELFSymbolsInTLSFixupsImpl(const MCExpr *Expr, MCAssembler &Asm) {
     // We're known to be under a TLS fixup, so any symbol should be
     // modified. There should be only one.
     const MCSymbolRefExpr &SymRef = *cast<MCSymbolRefExpr>(Expr);
-    cast<MCSymbolELF>(SymRef.getSymbol()).setType(ELF::STT_TLS);
+    // LDC
+    {
+      auto ofi = Asm.getContext().getObjectFileInfo();
+      if (!(ofi && ofi->getTargetTriple().isAndroid()))
+        cast<MCSymbolELF>(SymRef.getSymbol()).setType(ELF::STT_TLS);
+    }
     break;
   }

Almost all the same ltsmaster stdlib tests as Linux/Glibc/AArch64 pass with this patch, the few druntime tweaks listed above, and one last tweak to fix an alignment issue for the GC (will submit those tweaks as a druntime pull upstream next). Same with 1.10 cross-compiled from Linux/x64, running the ltsmaster DMD testsuite on my AArch64 smartphone for the first time now.

Update: Only 12 of the tested modules from the ltsmaster dmd testsuite fail natively on Android/AArch64 8.0, with 8 related to unimplemented core.stdc.stdarg.

kinke commented 6 years ago

Nice; integrated as https://github.com/ldc-developers/llvm/commit/13ba2b05d1857a180433a77604b98e0d502d3aa8 (can be squashed into ARM-Android one with v7; I always start with a clean upstream release_xy branch for each major and then cherry-pick our mods on top of it; squashing it now would have messed up too much history IMO).

joakim-noah commented 6 years ago

Btw, I've put up a pull to finish up std.conv support, the third item on the checklist, dlang/phobos#6633.

joakim-noah commented 6 years ago

Some great news, with @kinke's va_list pull #2811 for ltsmaster, I'm now able to natively build a working ldc master on both linux, ie Glibc/AArch64, and Android/AArch64. With his port of that pull to master, #2813, natively-built ldc master can build itself too.

All the same tests pass for ldc master as when cross-compiled, with the exception of one assert in std.algorithm.sorting, and a few test blocks that hang or segfault in core.thread, core.sys.linux.stdio, and std.concurrency only on Glibc/AArch64. Around 20-21 modules fail in the dmd testsuite with Glibc and Bionic.

I'm able to reproduce the debug test runner hangs mentioned above, back when I cross-compiled with ldc master from linux/x64 to Android/AArch64, when I now natively compile with ldc master on both linux and Android/AArch64. Looks like the issue is unoptimized codegen, as simply passing -O1 to the debug test runner and removing -O3 from the release test runner gets the debug one to work and the release not to. Also saw this when running the dmd-testsuite, d_do_test was crashing when compiled without optimization, worked fine when I tweaked the Makefile to build it with -O. Not unexpected, as we had similar issues with unoptimized codegen for ARM when we were first porting D there.

Looking forward to putting out 64-bit ARM builds of ldc for the upcoming 1.11 release. :laughing:

dnadlinger commented 6 years ago

Did you build using a LLVM build with assertions enabled? IIRC va_arg isn't even supposed to compile on ARM64/Linux.

kinke commented 6 years ago

According to the log, it apparently compiles (as long as feeding it primitive types only), but doesn't work. ;) runnable/variadic.d fails here. [And I'm using an LLVM with assertions on my box, that's how I found out about slices/IR structs not being supported.]

kinke commented 6 years ago

Oh and master still lacks the ExplicitByvalRewrite fix from ltsmaster. One should probably side-by-side diff the whole ltsmaster/master TargetABI stuff (not just AArch64). Plus va_arg doesn't work with master, apparently due to cross-module-inlining stuff added in the meantime (and not supporting D function declarations whose IR declaration is skipped, such as functions with va_arg pragma - not sure this skipping is worth the trouble).

joakim-noah commented 6 years ago

Did you build using a LLVM build with assertions enabled? IIRC va_arg isn't even supposed to compile on ARM64/Linux.

No, I don't, but @myfreeweb did and hit an assertion when he tried to use that pull. Even if va_arg and some math stuff is not working yet, it'll be good to get a mostly working AArch64 build into users' hands.

ghost commented 6 years ago

Some useful links for anyone looking to implement va_arg, too many acronyms and compiler magic for me:

http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf (Section B.4) https://github.com/llvm-mirror/clang/blob/release_70/lib/CodeGen/TargetInfo.cpp#L5183 https://github.com/ldc-developers/druntime/blob/976bd15e14da2529c16d1517c176f288e50b9cc2/src/core/stdc/stdarg.d#L555

JohanEngelen commented 6 years ago

Not unexpected, as we had similar issues with unoptimized codegen for ARM when we were first porting D there.

I can confirm that for a different piece of software, without -O3 I get really strange behavior of code. With -O0, I get miscompiled code, and adding a line of code suddenly makes the surrounding code work. Will try to reduce to a manageable testcase.

Edit: After a long minimizing session, I found this example. It fails with 1.12, but passes with 1.13-beta1... I don't know which commit fixed this. Perhaps worth tracking it down, so we know it is really fixed. (I don't have the energy now)

class Parser {
    ubyte hunderedforty = 140;

    auto parseModule() {
        auto m = new ubyte;
        bool isModule;
        bool isDeprecatedModule;
        if (currentIs(140) || isDeprecatedModule)
        {
            isModule = true;
        }
        else
        {
            assert(0);  // this assert is triggered with 1.12.0
        }
        ubyte[] declarations;
        return m;
    }

    bool currentIs(ubyte type) const {
        return hunderedforty == type;
    }
}

void main() {
    auto parser = new Parser();
    parser.parseModule();
}

joakim-noah commented 6 years ago

With the given testcase, I'm unable to reproduce with the official ldc 1.2 and 1.3 builds for linux/x64 and Android/AArch64, when compiled with the -O0 or -O3 flags.

JohanEngelen commented 6 years ago

I was using ubuntu trusty cross-compiling with ldc 1.12 to aarch64, and running the executable in qemu.

ldc-developers / ldc

[tracker] AArch64 progress #2153

[x] dlang/phobos#5947