quixdb / squash

Compression abstraction library and utilities
https://quixdb.github.io/squash/
MIT License
406 stars 53 forks source link

Allow plugins to request a compiler #91

Open g1mv opened 9 years ago

g1mv commented 9 years ago

I think it would be nice to let plugin developers use whatever compiler/compiler flags they wish, as set in makefiles.

Why ? Because it can dramatically alter performance. A simple example : on my side, density was developed and optimized against clang, therefore compiling with gcc incurs, most of the time, a loss in performance. I've actually had reports of a 40% difference in decompression speed and 20% difference in compression speed on some algorithms when using clang vs gcc with exactly the same settings, and that's ... enormous :astonished: . I think everyone would be happy and benefit from using their own settings, instead of generic ones which tend to lower the global performance of plugins and favor (especially in the benchmark case) algorithms compiled and optimized against gcc.

How ? That's another story, but I suppose you could do it quite easily by first compiling the plugins to static libraries using all the makefile options and switches, and then compiling squash against the static libraries created. Which would be even more resembling a "real-use" case, as 99% of the time a software projects links against custom-compiled-by-developer libraries and does not include their source code in its own tree. Also, most of the time makefiles will test if the preferred compiler is present (in the case of linux, clang/llvm will obviously be on any developer's platform, but it could be anything like ICC on windows), and fall back to gcc which is kind of a universal compilation platform nowadays.

So again on your side it would not change much, because as long as you have gcc you're covered, but on the library developer's side this kind of flexibility would be a huge benefit to showcase their lib's complete potential.

nemequ commented 9 years ago

The idea to let plugins choose compiler flags has been in the back of my mind for a while now, and I think I'll pretty much allow codec developers to request whatever flags they want, as long as they don't break anything. The current defaults are -O3 -flto.

Letting plugins choose a compiler is more difficult… if you can figure out how to make this work with CMake I'm open to it, as long as the compiler isn't required (i.e., it can request clang but if that's not installed it needs to fall back on whatever compiler is installed). This is made much more complicated by the fact that the entire environment will basically have to be re-verified (all the flags and compiler features have to be re-checked against the alternate compiler). I think the only feasible way to do this would be to use a subproject for each plugin, but even then I'm unsure about how you could make an attempt with one compiler and fall back on another if it fails.

Distributing static libraries with Squash isn't really feasible—there are just too many problems. Keeping everything up to date, especially on every platform Squash aims to support, would be a nightmare. The only reasonable solution I can think of is for libraries to install a system-wide library (and preferably get packages into Fedora and Debian/Ubuntu, and I guess NuGet if we ever support Windows), and have Squash use that when available. That said, I recently started intentionally not using system libraries for the benchmark so that there wouldn't be any variance in versions across platforms.

Also, your experience about how "99% of the time a software projects links against custom-compiled-by-developer libraries and does not include their source code in its own tree" is very different from mine. In my experience, people either link to system version of the library (especially on Linux) or include a copy of the source code in their tree so they can optimize for their use case/platform. The only time they do use static libraries is when the source code isn't available.

nemequ commented 9 years ago

Another difficulty with setting the compiler is that you probably don't want to override a user's choice. If they ask you to compile with clang and the plugin uses gcc anyways it could easily cause problems, especially for cross-compilation…

g1mv commented 9 years ago

Hey Evan thanks for your time.

Letting plugins choose a compiler is more difficult… if you can figure out how to make this work with CMake I'm open to it, as long as the compiler isn't required (i.e., it can request clang but if that's not installed it needs to fall back on whatever compiler is installed). This is made much more complicated by the fact that the entire environment will basically have to be re-verified (all the flags and compiler features have to be re-checked against the alternate compiler). I think the only feasible way to do this would be to use a subproject for each plugin, but even then I'm unsure about how you could make an attempt with one compiler and fall back on another if it fails.

What you could do is make GCC compatibility mandatory, that is if any error is caught while trying to compile with the compiler of choice, you clean everything and restart with GCC as a failover. Of course if GCC is the main choice, then all is sorted. This is going to be a huge benefit for people developing with ICC or CLANG, and we're not talking PGO here, the point is just to be able to choose a compiler with which you've worked so you know the assembly code generated will be on par with what you expect. Imagine if you've developed something using GCC, and your soft turns out to be slower when compiled with other compilers (assembly code doesn't perform as well), what will you do ? in real life you will of course always distribute your library compiled with GCC !

Also, your experience about how "99% of the time a software projects links against custom-compiled-by-developer libraries and does not include their source code in its own tree" is very different from mine. In my experience, people either link to system version of the library (especially on Linux)

That's exactly what I'm saying : the system version has been compiled and optimized for the system, by the best compiler available. The "custom-compiled-by-developer" was badly worded, I just meant that libraries are compiled and optimized for their target platform.

Distributing static libraries with Squash isn't really feasible—there are just too many problems.

That's not what I'm saying, what I mean is that you keep the source tree in squash as you currently do, but you compile each plugin individually as a library, and finally you link squash against all the generated libraries. That's how libraries are used most of the time, as you stated previously - and I completely agree. When I use SSL I just link to the libssl library on the system, I don't include libssl source in my tree to recompile, first because code produced might be slower than the .so and second because it's not handy.

If we trust codec developers with this, what happens when they create an "x86" release with SSE4 (or AVX, or AES-NI, etc.)?

Nobody creates anything, the compilation takes place on the platforms you propose with the standard flags set by the developer, so no such thing will happen.

If you're tuning for a specific platform you're probably going to want to set -march=…, using static libraries makes that impossible.

Again the libraries will be compiled on each platform with their standard flags - so no tuning - we're not talking compiler flagging but compiler and assembly code produced.

It encourages people to cheat on the benchmark by using PGO. It prevents other people from using PGO to optimize for their use case.

No I don't see why, there's no optimization going on, you just select a compiler and a set of flags and they are used everywhere. If let's say on a platform you don't have CLANG because it doesn't exist for that particular system, and the library requests it, you just revert to GCC because GCC compatibility is a prerequisite you have set. Simple !

The size of the repository would explode.

It would be exactly the same size, the compilation process would be different but that's it.

nemequ commented 9 years ago

That's exactly what I'm saying : the system version has been compiled and optimized for the system, by the best compiler available. The "custom-compiled-by-developer" was badly worded, I just meant that libraries are compiled and optimized for their target platform.

I think in reality it's more the compiler that tends to be most widely supported. For example, most (all?) Linux distros use GCC by default. They also tend to not be particularly well-optimized, AFAIK most packages just leave the defaults (usually -O2 -g, the strip the debugging symbols into a separate package).

That's not what I'm saying, what I mean is that you keep the source tree in squash as you currently do, but you compile each plugin individually as a library, and finally you link squash against all the generated libraries. That's how libraries are used most of the time, as you stated previously - and I completely agree. When I use SSL I just link to the libssl library on the system, I don't include libssl source in my tree to recompile, first because code produced might be slower than the .so and second because it's not handy.

Ah, okay, that's much more feasible. In that case, really only the first two paragraphs of my response apply—I thought you were suggesting we distribute pre-compiled static libraries in the repo. This is basically what we already do when we aren't using system libraries. The plugins are all shared libraries—there aren't any static libraries, but that's really just an implementation detail… actually, it's probably a bit better to compile the whole plugin at once than use static libraries since the compiler should have an easier time with LTO. The main point is that they are basically independent from libsquash (in fact, it is meant to be possible to have plugins live outside of the Squash repository altogether).

I'm willing to allow plugins to request different optimization flags. In the past I was against this but after looking into distribution policies a bit it seems they are more lenient than I thought, so I don't thing there will be a problem there.

So, AFAICT the main issue here is the ability to request a default compiler. I'm okay with this in principle, but I'm not sure how feasible it is from CMake's perspective. I know you can do something like set (CMAKE_C_COMPILER clang), but I don't know how to make it fall back on the system default compiler. We would also have to be careful with the implementation details to make sure we don't break cross-compilation (if the user wants to use mingw64-gcc and we switch to clang the results will not be good). And we would probably have to use subprojects to make sure feature checks (like testing what flags the compiler supports) don't leak from the core to the plugins.

g1mv commented 9 years ago

This seems interesting : http://stackoverflow.com/questions/7031126/switching-between-gcc-and-clang-llvm-using-cmake

Apparently, CC and CXX are recognised, maybe they can be changed on the fly ? That could enable the use of any alternate compiler.

Or apparently, as one answer suggest, you could create different build trees with different compilers, each containing the libraries requesting that compiler, and then link all the created libraries against squash.

nemequ commented 9 years ago

Just had a conversation in #cmake on freenode about this. The closest we could come seems to be using an ExternalProject for each plugin and passing the C/C++ compiler to the configure command. The problem is that there is no way to tell how cmake chose the compiler, so you might end up switching from a cross-compiler to a native compiler. I think the best way around that would be to just hide the functionality behind a variable and have it off by default… to enable it, you would just pass -DUSE_OPTIMAL_COMPILERS=yes or something similar. It's a bit ugly, but doable.

I don't think I'm going to work on this any time soon, but I'll leave the bug open in case anyone else wants to give it a try.

g1mv commented 9 years ago

Well, cmake is not that flexible apparently. Here is what would be doable : if a dev wants specific compile options, he/she'd need to send you a makefile that you would add in the library's base directory (where CMakefiles.txt is), named Makefile.custom for example. What I could do to help is create a central Makefile (let's call it Makefile.customs) sitting in the plugins/ directory, which would have to be launched first. It would search all directories for a Makefile.custom, and if found would launch it to create a linkable library. Then, you'd run cmake as ususal but obviously if the make process finds the linkable library already present in the plugin directory it won't overwrite it, if not then it would create it (this assumes you use cmake to create linkable libraries as well). Once this process is done (make Makefile.customs && cmake && make), you'd have all the libraries sitting in their respective plugins' directory. The last step would be done as now by linking them to the squash benchmark objects.

nemequ commented 9 years ago

That is more complicated than the ExternalProject idea, and it's not as portable (which is the reason I switched from autotools to cmake), and would effectively create a parallel build system for each plugin. Why not just do what I suggested and turn each plugin (or maybe just those which opt-in) into an external project?

g1mv commented 9 years ago

Sure, why not, but I'm not a cmake specialist and won't really be of much help there ...

g1mv commented 9 years ago

Actually things might change, I've recently taken a very close look at cmake and I'm currently testing it in order to use it as density's build system. It seems to simplify things a bit regarding portability and it's apparently becoming a de facto standard nowadays. So I might be able to provide some help on this later on.

nemequ commented 9 years ago

Actually things might change, I've recently taken a very close look at cmake and I'm currently testing it in order to use it as density's build system. It seems to simplify things a bit regarding portability and it's apparently becoming a de facto standard nowadays.

It's really not becoming a de facto standard (autotools is still vastly more common on Linux, at least). IMHO the big reason to use it is that it supposedly has good Windows support.

I tend to avoid too many checks in the build system and instead put them in the source code where possible—the main reason for this is that it makes code easier to move around (between projects, build systems, etc.) and easier to embed in other projects. The Pre-defined Compiler Macros Wiki is a great resource for that.

Putting lots of checks inside the build system couples the source code tightly with it, and some of those checks don't work correctly for cross-compilation, so you have to be careful.

So I might be able to provide some help on this later on.

Density (and sharc) are pretty simple. They would be good introductions to CMake, but this bug is a bit more complicated. I'm sure you could figure it out, just wanted to warn you that converting density or sharc probably will not be sufficient preparation :(

I'll try to take care of this soon if you don't. I've already thought through the code, I don't think the implementation will be too bad.