quixdb / squash

Compression abstraction library and utilities
https://quixdb.github.io/squash/
MIT License
406 stars 53 forks source link

Add clang to AppVeyor #185

Open nemequ opened 8 years ago

nemequ commented 8 years ago

See https://groups.google.com/d/msg/squash-compression/qjrshDe-3WY/GmGEQncpAgAJ

jibsen commented 8 years ago

Just dumping some information while working on this.

There are two (official) installers for clang on Windows, one that integrates with Visual Studio, and one that works with mingw.

There is a bug in the 3.7.x release that makes the mingw version fail when linking C++ executables (missing __chkstk_ms). For some of the plugins, adding -lgcc to the linker flags may help.

There is another bug that prevents munit from compiling. It is due to clang trying to use stdatomic.h from GCC, but failing.

By disabling the following plugins: brotli, bsc, gipfeli, lzham, lzo, ms-compress, snappy, yalz77, zling, zpaq, and doing the workaround for munit, I got the rest to build with clang on mingw-w64, and the unit test appears to work.

jibsen commented 8 years ago

Installed the version that integrates with VS to see how that works. I did not look at the error messages, but simply disabled any plugin that did not build (brotli, gipfeli, libdeflate, lzham, lzma, ms-compress, snappy, yalz77, zlib-ng, zling, zpaq, zstd).

With those disabled, it built and the unit test runs.

jibsen commented 8 years ago

Okay, upon further investigation, most of the errors with clang/VS were due to yet another bug, where clang does not recognize what version of VS it runs on top of. Adding -fms-compatibility-version=19 to the compiler flags when using Visual C++ 2015 fixed that.

The remaining issues are libdeflate, lzham, lzma, and zstd using intrinsics without including intrin.h, and gipfeli and zlib-ng have other erros I have not looked into yet. Also it looks like lzham crashes the threads unit test.

nemequ commented 8 years ago

Yikes, what a mess. Great job investigating this, though.

I think it's pretty clear that the mingw version isn't ready yet; we can revisit that after 3.8 is released.

For the VS version, though, it's easy enough to add that flag in the AppVeyor configuration, and it would be good go get a jump on fixing those plugins (especially if it's just a matter of including intrin.h).

jibsen commented 8 years ago

@elliotwoods posted a link in #145 to a blog post about an experimental clang toolset called "Clang with Microsoft CodeGen", provided by MS in VS 2015 Update 1. While it seems a bit unstable, it could potentially be used to compile plugins that would be difficult to add MSVC support to.

Just wanted to note the changes I made under properties to get it to compile the DENSITY plugin:

elliotwoods commented 8 years ago

@jibsen - I followed your instructions and then it would in Debug. Thank you!

but to build with -O2 (or -Oanything), i needed to add to globals.h:

#define density_likely(x) x
//__builtin_expect(!!(x), 1)
#define density_unlikely(x) x
//__builtin_expect(!!(x), 0)

i.e. nullify the branch prediction which.....might...make...things...not so fast(?)

nemequ commented 8 years ago

MSVC doesn't support anything like builtin_expect, not much anyone can do about that. My understanding is that branch prediction in modern x86/x86_64 CPUs very good, and the cost of a mispredicted branch isn't actually that bad. `builtin_expect` is mostly useful for other architectures these days, which Windows doesn't support anyways.

That said, I know density isn't as fast on MSVC. I doubt branch prediction is a major factor, but if you're concerned about speed you really need to be doing your own benchmarking with a realistic workload for your application. It may very well be that another codec works much better for you; that's a big part of what makes Squash interesting… you can test a lot of codecs without changing your code.

Also, you should really define the macros as (x) not x. It's much safer.

elliotwoods commented 8 years ago

(note : this is using CodeGen, i.e. clang with Visual Studio, not MSVC compiler) Good to know that branch prediction is optimised well on modern architectures/compilers

yes! we'll definitely test with other compressors in real world scenario and yes! that's one of the main reasons for using squash here.

ok i'll do as you say with the macros

jibsen commented 8 years ago

Thanks for the heads up.

Like @nemequ says, MSVC does not have an equivalent intrinsic. I get an ICE for it:

2>    %85 = call i32 @llvm.expect.i32(i32 %84, i32 0), !dbg !558
2>d:\dump\squash\plugins\density\squash-density.c(223): fatal error C1001: An internal error has occurred in the compiler.
2>  (compiler file 'llvm-bridge.cpp', line 5667)

Certainly would have been preferable if they had simply ignored that intrinsic and emitted a warning.

elliotwoods commented 8 years ago

yes you're totally right. MSVC's port of the Clang compiler doesn't support it. my bad, sorry for misreading

elliotwoods commented 8 years ago

Here's the results with my build. image

Notes:

Here we can see snappy (selected) comes out around the same compression ratio as density, but density is considerably faster (grid lines in y are each 50MBps, so it's about 125MBps faster)

Perhaps this isn't the right thread for this (i'm just continuing from the above). so feel free to move this if it's useful elsewhere

jibsen commented 8 years ago

So, Clang 3.8.0 is out, and I have been trying to get the various combinations of Clang with mingw-w64 and MSVC headers to build squash. The closest to success was the MSYS2 version of Clang, which works except for two minor issues:

Working around these two issues, squash builds and the unit tests run.

The installer from llvm.org using the mingw-w64 (GCC 5.3.0) headers and --target=x86_64-w64-mingw32 further had an issue with lzo using __int64 instead of long long (probably not expecting Clang on Windows).

The installer from llvm.org using the MSVC (Visual Studio 2015) headers and -fms-compatibility-version=19 had problems with a number of plugins using MSVC intrinsics without including <intrin.h> (libdeflate, lzma, zlib-ng, zstd), and gipfeli using std::min/max without including <algorithm> (C++11).

nemequ commented 8 years ago

The µnit thing is definitely a problem… I'm willing to work around that in µnit, but I'm not sure why you would be hitting it. stdatomic.h should only be included in µnit if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 201112L) && !defined(__STDC_NO_ATOMICS__), but we compile in C99 mode…

For the other issues, it sounds like we can just disable the offending plugins until the issues are fixed.

jibsen commented 8 years ago

There does not appear to be any use of the -std= flag in the Clang build, so it is using C11 by default. I don't know if CMake implements the standard check as a "greater than or equal".

nemequ commented 8 years ago

Damnit, cmake.

Well, we could do something like (untested):

diff --git a/munit.c b/munit.c
index 0607763..d76d06f 100644
--- a/munit.c
+++ b/munit.c
@@ -263,13 +263,13 @@ munit_malloc_ex(const char* filename, int line, size_t size) {
 #  endif
 #endif

-#if defined(HAVE_STDATOMIC)
+#if defined(HAVE_CLANG_ATOMICS)
+#  define ATOMIC_UINT32_T _Atomic uint32_t
+#  define ATOMIC_UINT32_INIT(x) (x)
+#elif defined(HAVE_STDATOMIC)
 #  include <stdatomic.h>
 #  define ATOMIC_UINT32_T _Atomic uint32_t
 #  define ATOMIC_UINT32_INIT(x) ATOMIC_VAR_INIT(x)
-#elif defined(HAVE_CLANG_ATOMICS)
-#  define ATOMIC_UINT32_T _Atomic uint32_t
-#  define ATOMIC_UINT32_INIT(x) (x)
 #elif defined(_WIN32)
 #  define ATOMIC_UINT32_T volatile LONG
 #  define ATOMIC_UINT32_INIT(x) (x)
@@ -280,14 +280,14 @@ munit_malloc_ex(const char* filename, int line, size_t size) {

 static ATOMIC_UINT32_T munit_rand_state = ATOMIC_UINT32_INIT(42);

-#if defined(HAVE_STDATOMIC)
-#  define munit_atomic_store(dest, value)         atomic_store(dest, value)
-#  define munit_atomic_load(src)                  atomic_load(src)
-#  define munit_atomic_cas(dest, expected, value) atomic_compare_exchange_weak(dest, expected, value)
-#elif defined(HAVE_CLANG_ATOMICS)
+#if defined(HAVE_CLANG_ATOMICS)
 #  define munit_atomic_store(dest, value)         __c11_atomic_store(dest, value, __ATOMIC_SEQ_CST)
 #  define munit_atomic_load(src)                  __c11_atomic_load(src, __ATOMIC_SEQ_CST)
 #  define munit_atomic_cas(dest, expected, value) __c11_atomic_compare_exchange_weak(dest, expected, value, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)
+#elif defined(HAVE_STDATOMIC)
+#  define munit_atomic_store(dest, value)         atomic_store(dest, value)
+#  define munit_atomic_load(src)                  atomic_load(src)
+#  define munit_atomic_cas(dest, expected, value) atomic_compare_exchange_weak(dest, expected, value)
 #elif defined(__GNUC__) && (__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)
 #  define munit_atomic_store(dest, value)         __atomic_store_n(dest, value, __ATOMIC_SEQ_CST)
 #  define munit_atomic_load(src)                  __atomic_load_n(src, __ATOMIC_SEQ_CST)
jibsen commented 8 years ago

I tried the May 2016 update to Clang/C2, which includes Clang 3.8.0, along with CMake 3.6.0-rc1 which supports the toolset.

I still get a couple of ICE, and a few plugins do not work, but it definitely seems to be improving. And it is really nice to be able to build from CMake instead of having to modify the build settings in VS.

jibsen commented 8 years ago

I had a little time to look into this, and I am going to try to describe the current issues building squash with Clang/C2 (warning: wall of text incoming).

Starting out with a freshly cloned squash, we build with CMake:

mkdir build
cd build
cmake -G "Visual Studio 14 2015" -T v140_clang_3_7 ..
cmake --build . --config Debug

First error is:

    %11 = atomicrmw sub i32* %10, i32 1 seq_cst, !dbg !34
d:\dump\squash2\squash\squash\object.c(253): fatal error C1001: An internal error has occurred in the compiler. [D:\dump\squash2\squash\build\squash\squash0.8.vcxproj]
  (compiler file 'llvm-bridge.cpp', line 6772)

an ICE compiling squash/object.c(253), which calls the atomic builtin __sync_fetch_and_sub. We can work around this by using:

diff --git a/squash/object.c b/squash/object.c
index 14c6175..4007c2c 100644
--- a/squash/object.c
+++ b/squash/object.c
@@ -29,7 +29,7 @@
 #include <stdbool.h>
 #include <stddef.h>

-#if defined(__GNUC__) || defined(__clang__) || defined(__INTEL_COMPILER)
+#if defined(__GNUC__) || (defined(__clang__) && !defined(__c2__)) || defined(__INTEL_COMPILER)
 #  define squash_atomic_inc(var) __sync_fetch_and_add(var, 1)
 #  define squash_atomic_dec(var) __sync_fetch_and_sub(var, 1)
 #  define squash_atomic_cas(var, orig, val) __sync_val_compare_and_swap(var, orig, val)

Next error is in the gipfeli plugin, which uses std::min and std::max without including <algorithm> (PR: google/gipfeli/pull/10).

Then we have another ICE in the lzma plugin, I haven't investigated this further:

  '_mm_movemask_epi8': Intrinsic not yet implemented:
    %4 = call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %3)
D:\dump\squash2\squash\build\plugins\lzma\lz_encoder_mf.c: fatal error C1001: An internal error has occurred in the compiler. [D:\dump\squash2\squash\build\plugins\lzma\squash0.8-plugin-lzma.vcxproj]
  (compiler file 'llvm-bridge.cpp', line 5686)

An error in the lzo plugin (Clang issue reported):

D:\dump\squash2\squash\plugins\lzo\lzo\src/lzo_func.h(212,1): error: expected ';' after struct [D:\dump\squash2\squash\build\plugins\lzo\squash0.8-plugin-lzo.vcxproj]
  __lzo_byte_struct(lzo_memops_TU8_struct,8)
  ^
  D:\dump\squash2\squash\plugins\lzo\lzo\include\lzo/lzodefs.h(1729,84) :
     note: expanded from macro '__lzo_byte_struct'
  #  define __lzo_byte_struct(s,n)        __lzo_struct_packed(s) unsigned char
  a[n]; __lzo_struct_packed_end()

        ^
  D:\dump\squash2\squash\plugins\lzo\lzo\include\lzo/lzodefs.h(1715,52) :
     note: expanded from macro '__lzo_struct_packed_end'
  #  define __lzo_struct_packed_end()     } __pragma(pack(pop));
                                                     ^

We can work around this with:

diff --git a/include/lzo/lzodefs.h b/include/lzo/lzodefs.h
index 1535c1e..ea60d1d 100644
--- a/include/lzo/lzodefs.h
+++ b/include/lzo/lzodefs.h
@@ -1712,7 +1712,7 @@ extern "C" {
 #  define __lzo_struct_packed_ma_end()  } __lzo_may_alias __attribute__((__packed__));
 #elif (LZO_CC_INTELC_MSC) || (LZO_CC_MSC && (_MSC_VER >= 1300))
 #  define __lzo_struct_packed(s)        __pragma(pack(push,1)) struct s {
-#  define __lzo_struct_packed_end()     } __pragma(pack(pop));
+#  define __lzo_struct_packed_end()     }; __pragma(pack(pop))
 #elif (LZO_CC_WATCOMC && (__WATCOMC__ >= 900))
 #  define __lzo_struct_packed(s)        _Packed struct s {
 #  define __lzo_struct_packed_end()     };

An error in the zlib-ng plugin, which defines its own version of __builtin_ctzl (PR: Dead2/zlib-ng/pull/70).

And finally an ICE in munit.c(462) calling C11 atomic_compare_exchange_weak:

  'munit_rand_double': Unexpected atomic instruction -- use Windows interlock intrinsics
    %25 = cmpxchg weak i32* @munit_rand_state, i32 %23, i32 %24 seq_cst seq_cst, !dbg !86
d:\dump\squash2\squash\tests\munit\munit.c(462): fatal error C1001: An internal error has occurred in the compiler. [D:\dump\squash2\squash\build\tests\test-squash.vcxproj]
  (compiler file 'llvm-bridge.cpp', line 6772)

There are preprocessor checks starting at munit.c(272) that check for the presence of C11 and Clang atomics. Clang/C2 reports both available, but produces the ICE above when trying to use them. Until this is fixed we can work around it with:

diff --git a/munit.c b/munit.c
index c41cfd3..713f5b8 100644
--- a/munit.c
+++ b/munit.c
@@ -277,6 +277,11 @@ munit_malloc_ex(const char* filename, int line, size_t size) {
 #  endif
 #endif

+#if defined(__clang__) && defined(__c2__)
+#  undef HAVE_STDATOMIC
+#  undef HAVE_CLANG_ATOMICS
+#endif
+
 #if defined(HAVE_STDATOMIC)
 #  include <stdatomic.h>
 #  define ATOMIC_UINT32_T _Atomic uint32_t

With the mentioned fixes/workarounds, and disabling the lzma plugin for now, squash builds with Clang/C2.

Running the unit tests with ctest -V -C Debug results in the gipfeli plugin failing 12 tests (crashing in two). This may be related to intrinsics, since the tests pass when rebuilding the plugin with HAVE_BUILTIN_CTZ disabled. However, the plugin works with Clang targeting mingw-w64, so the issue appears to be with Clang/C2.

jibsen commented 7 years ago

It looks like Clang/C2 most likely will not get further updates (see this tweet and comments at this blog post).

jibsen commented 5 years ago

I've updated the AppVeyor configuration, and Clang (7.0.0) targeting both MSVC (clang-cl and NMake) and mingw-w64 GCC is building and the unit tests run (with the plugins disabled which have issues with Clang on Windows or Windows in general).