Adding flag to disable/enable binaries in CMakeLists.txt

anp commented 8 years ago

Good morning!

I'm having issues with the cmake build on CPUs which are missing certain features. For example, even though my university's cluster has CPUs with SSE 4.1, the cmake build skips that feature. While odd, it's fine since the AVX2 support compiles just fine, but it means that I'm getting linker errors when the cmake build automatically tries to link everything (including SSE 4.1 functions which haven't compiled) to the aligner executable. I don't have the same issues with the autotools build, but that's much harder for me to bundle with my wrapper code.

In the version of parasail I'm bundling with my library, I'm just removing the arguments to build the aligner, but I don't want to have a fork be the long-term solution here, in fact I'd like to find a way to not have to bundle code at all eventually. Would you be interested in adding a flag like "cmake -DBUILD_EXECUTABLES=FALSE" to CMakeLists.txt? Or "-DSKIP_EXECUTABLES=TRUE"? This would make it easy for me to programmatically sidestep the issue of the aligner trying to link with features that haven't compiled.

jeffdaily commented 8 years ago

Though I've used CMake in the past, I'm not a pro. I put the CMake build together extremely quickly and it isn't well tested. I am open to better ideas like the one you suggested.

My intent was that a sufficiently advanced compiler would always be available for binary releases, package managers, etc. That way, for example, the most advanced instruction set would be at least compiled and available for an Ubuntu package maintainer but it wouldn't require an Ubuntu user to be forced to by a Haswell processor...

I'm considering adding stubs for the platform-specific code. For example, if AVX2 isn't supported by the compiler, the stubs for the AVX2 routines would be compiled and linked, instead. This would solve any missing symbols at link time. Any attempt to use the missing functions would cause a non-zero exit condition and helpful stderr message indicating the problem. Do you think this is an appropriate solution?

anp commented 8 years ago

I totally understand about cmake -- I was glad you threw one together at all! I can try out a couple of options for parameterizing the cmake build options and submit a PR when I have a minute, if you're interested.

I can see how the stubs would work -- how would that affect the dynamically dispatching functions? For my use case at least, that's what I intend to use for the foreseeable future since Rust does not have very mature SIMD feature detection yet.

jeffdaily commented 8 years ago

Since my idea for fixing this is to use stubbed code, I am assigning this a milestone of v1.1 instead of making this a hotfix for v1.0.1. Adding new functions, even if stubbed, feels like a backwards-compatible-though-I'm-adding-code kind of change. Wouldn't break ABI I don't think, but I'm new to proper versioning rules.

anp commented 8 years ago

Are you intending to use semantic versioning? Then I think that 1.1 is the appropriate version. Will the stubbed functions delegate to the dynamically dispatching ones?

jeffdaily commented 8 years ago

Are you intending to use semantic versioning?

Yes, I intend to use semantic versioning. I just hope to get it right. I don't routinely develop released software. Usually just one-off codes for other researchers. Should be fun.

Will the stubbed functions delegate to the dynamically dispatching ones?

Excellent question. Before I opened it up for discussion, my first thought was no. If a stubbed function is called, then then user should be notified that they didn't get what they expected. I had assumed that only power users would call routines specific to a certain instruction set such as AVX2. Delegating such a call to the dynamic dispatcher version feels dishonest.

Now, this is probably a dumb example, but let's say someone is doing a performance benchmark comparing my code to another. If they thought they were using AVX2 but were instead using the dispatcher function, then it wouldn't be a fair comparison. (Though I would hope anyone attempting a performance benchmark like this would know that they were doing and would actually verify they were running the code they thought they were.)

For the Python bindings that I bundled with parasail, I didn't wrap the CPU-specific code. I only wrapped the dispatchers. Primarily, it was to avoid the missing symbols problem because I hadn't decided to stub out the missing ones at that time. Secondly, I wondered if the CPU-specific functions would actually get called. Though I make them available to C/C++ users, I wonder if they are too specific, too confusing to be useful. I personally find them useful. I mean, if I'm going to write those functions, I might as well give power users the same power.

Do you think it's appropriate to call the dispatcher for the CPU-specific functions that don't get compiled? Is it better for it to just "work" instead of failing with some error indication? What would a good error indication be? Return NULL and set errno to EPERM or EFAULT or something else appropriate? Or is it correct to leave things the way they are, don't stub out symbols, and let the linker tell users they have an incomplete/incorrect build of parasail? (Assuming I fix the linker current linker errors.)

anp commented 8 years ago

I think either a minor version number bump (0.x.0) or a patch version number bump (0.0.x) would be appropriate in semver, and adding function stubs seems to be in a bit of a gray area to me. By the way, I think it's absolutely fantastic that you released this, it's saving me months of work for which I am very grateful.

Re: delegating to dynamic dispatchers, I think you're right. I was previously thinking "well, if you call an alignment function you should get an alignment," but in the context of the intended users, I think the approach you describe definitely makes more sense. If you're ambitious enough to handle your own feature detection and your own dispatch, you should probably be handling errors instead of having the library do magic behind your back. FWIW, I do think that documenting the new behavior of the compiled stubs would be a good call.

By the way, have you benchmarked the overhead of the dynamic functions vs. the hardcoded functions?

jeffdaily commented 8 years ago

@dikaiosune I need your help testing the fixes for the CMake build. I've got a new feature branch for this work (now that I'm trying to do more of an unofficial git-flow branching model). The latest commit was 8bac8bc on the features/isa-stubs branch.

I reread your first message here and I missed the part where sse4.1 wasn't properly detected. That shouldn't happen, obviously. It gets detected okay for me. Any way you could share the relevant portion of your CMakeFiles/CMakeError.log with me?

In other news, I somehow got carried away with CMake and have yet another feature branch where I'm trying to get the build working natively on Windows (feature/msvc-build). For some reason I thought that helped you, because I wrongly assumed that the only reason anyone uses CMake is for Windows. I'll blame it on my stuffy medicine head today. But it too has some problems detecting SSE and AVX support, with or without compiler switches. But it has to do with the test code that gets compiled because some of the intrinsics aren't as portable as I thought.

Please test the feature/isa-stubs branch and if it works then I'll start working towards a 1.0.1 release to fix that. But don't forget I'd really like to get the SSE4.1 detection correct, too. Your help is greatly appreciated!

jeffdaily commented 8 years ago

Fascinating. I just tried the cmake build on one of our clusters and it failed with missing symbols from the sse41 code. Not all of them. Just some. It looks like the missing symbols are all "stats" versions of the functions. Something fishy going on. cmake build works on my macbook and using visual studio (in that other feature/msvc-build branch). The cmake test for SSE4.1 is passing. I looked in the CMakeFiles and see that there are indeed object files for the missing symbols, and the object files do contain the missing symbols. I'm really not sure what's going on.

jeffdaily commented 8 years ago

Update. It appears to have something to do with the version of CMake. I have successfully compiled using the configure script with a large number of compilers (gnu, intel, pgi) and various versions. When I switch to the cmake build, many seem to fail -- I did not put in the time to try all compiler and version combinations but the first few I tried failed in the same way. The linker step is failing to link all of the symbols even though the object files do indeed exist and are getting included in the linker invocation.

We had two cmake versions installed on my cluster.

cmake/2.8.12(default)
cmake/3.3.0

Builds perfectly with cmake/3.3.0 but fails with cmake/2.8.12. I may need to bump up the minimum cmake version! I just don't know which minimum version to enforce until I narrow down the minimum version that actually works.

anp commented 8 years ago

As an aside, the main reason I used cmake was for the parallel builds -- it's a couple minute build for parasail on my machine with cmake vs. a 10-15 minute build with autotools.

So yesterday, I actually discovered the same thing you did with the cmake versions available on our cluster. I didn't see your comments here until after I had finished modifying my wrapper library to use the autotools build, although I'm definitely still happy to help troubleshoot the cmake issues. I just won't be able to use cmake for my use case until they get cmake updated on the cluster, or we figure out a way to do the build on RHEL/CentOS's bundled version of cmake.

I also suspect that the SSE 4.1 issue is due to the cmake version from my tests so far, as the autotools build works perfectly fine on the cluster.

jeffdaily commented 8 years ago

Alright, I just downloaded a bunch of cmake binaries from https://cmake.org/files/ and systematically went through 2.8.12.x through 3.1.0. It finally worked with 3.1.0. I think I'll need to make that the minimum version.

There's a chance that I did something wrong in the CMakeLists.txt. I'm utilizing the CMake object library capability that was introduced in 2.8.8 so that I can do per-library preprocessor and compiler flags and isolate the ISA specific code. I don't fully understand how static linkers work, but I assumed that if all objects were included in the same target library that the linking order was irrelevant because all symbols would be resolved internally. I tried playing with the order that I specified the object libraries in the final libparasail.a target, but nothing ever changed.

Changing to CMAKE_MINIMUM_REQUIRED( VERSION 3.1 ) feels like a work-around, but I don't know that I have much of a choice.

As an aside, the main reason I used cmake was for the parallel builds -- it's a couple minute build for parasail on my machine with cmake vs. a 10-15 minute build with autotools.

How are the cmake parallel builds different than the regular autotools parallel builds? I use "make -j 16" for the cmake build and "make -j 16" for the autotools build -- exact same. CMake 3.1.0 and below seems to interleave the output in a jumbled mess, but the autotools build doesn't. Both builds go extremely fast, like you said.

anp commented 8 years ago

Well, I definitely didn't forget to specify the jobs flag...oops. Although since I have to specify everything inside a buildscript, I would prefer to let cmake detect the number of CPUs rather than do it myself :).

I've never seen a static link that broke with different archive declaration order, but I've not seen many complex static links (> 3 or 4 archives). I think updating the minimum cmake version required is sensible, but I'm far from a cmake expert, mostly just a clumsy user.

anp commented 8 years ago

Scratch that, the Rust build system provides a very clean way for me to have the number of CPUs either detected or specified for me at build time. Which honestly eliminates the only use case I had for cmake, since I don't expect anyone to run my wrapper on Windows.

jeffdaily commented 8 years ago

Fixed in release v1.0.1.

jeffdaily / parasail

Adding flag to disable/enable binaries in CMakeLists.txt #9