xflouris / libpll

Phylogenetic Likelihood Library
GNU Affero General Public License v3.0
26 stars 6 forks source link

AVX runtime detection? #136

Closed tillea closed 7 years ago

tillea commented 7 years ago

Hi, I have packaged libpll for Debian. As far as I understood libpll does runtime detection of processor features. Unfortunately there seem to be some be some issue since the package got some Bug report which requests to disable AVX at build time via --disable-avx. I wonder what you think about this and how this could be sensibly fixed. Kind regards, Andreas.

xflouris commented 7 years ago

Hello @tillea,

thank you very much. libpll does not detect the processor features on runtime yet, but the autoconf installer finds available cpu features and compiles the corresponding source files.

I think this is already solved by using the preprocessor macros that automake/autoconf define. The correct and up to date sources of libpll (that use these macros) are in the 'dev' branch. Could it be that you have packaged the master branch instead (which is a bit behind)? I'll synchronize the master/dev branch to eliminate this confusion.

Best regards, Tomas

tillea commented 7 years ago

Hi Tomas,

thanks for the quick response.

On Fri, May 05, 2017 at 07:39:36AM -0700, Tomas Flouri wrote:

libpll does not detect the processor features on runtime yet,

Please note that this would be the optimal solution for a distribution where the machine the code was build on can have other features than the machine where it is executed.

but the autoconf installer finds available cpu features and compiles the corresponding source files.

I think this is already solved by using the preprocessor macros that automake/autoconf define. The correct and up to date sources of libpll (that use these macros) are in the 'dev' branch. Could it be that you have packaged the master branch instead (which is a bit behind)?

I can confirm that I'm using master branch. IMHO this should be what users need to assume is somehow stable.

I'll synchronize the master/dev branch to eliminate this confusion.

Thanks. It would be even better if you would add release tags (see issue #132).

Kind regards

   Andreas.
tillea commented 7 years ago

Hi @xflouris, I checked several times but the sync between dev and master has not happened yet neither did you considered a release tag. I'd like to stress that this would be very helpful. The current libpll package in Debian is shiping master which has a serious bug as I mentioned above and I would really like to fix this. Kind regards, Andreas.

xflouris commented 7 years ago

Hello @tillea,

thank you for all the information, and I'm sorry for the delay. I'm working to resolve this issue now, and will update the master + release tag tonight.

I have some questions regarding the run-time hardware detection, and I'd like your opinion about it.

In the libpll version you packed, detection of hardware features (SSE, AVX, AVX2) is done at compile-time by autotools, which instructs that the corresponding files are compiled if such a feature is available. Once the library is built, the application programmer can use the API and call optimized functions (e.g. AVX,SSE) that were available at compile-time, but can also call the equivalent non-optimized functions which are always available.

Now, assume we compile libpll on machine A (has features F1 but not F2), and want to distribute it to machines B (has neither F1 nor F2), and machine C (has both F1 and F2). Machine B users will be able to execute the F1 optimized functions, but execution will fail because F1 is not present on B. Machine C users will be able to execute F1 functions without a problem, but will not be able to execute F2 functions even though their hardware supports it, because libpll was not built with F2 enabled.

If I understand correctly your suggestion above, shall I disable the autotools detection of hardware features, and allow compilation with all hardware features enabled? That would require at least GCC 4.6.0 (March 25, 2011) since this is the first version with AVX compatibility. I would then add run-time checks at each optimized function, to see whether the required hardware feature is available, and if it is not, it would fall-back to the non-optimized functions. Would something like this work?

Best regards, Tomas

xflouris commented 7 years ago

Regarding the Bug report I agree, for i386 avx optimizations should be disabled at compile-time (will update configure.ac)

EDIT: I realize that most i386 architectures can have SSE.

tillea commented 7 years ago

Hi Tomas,

On Sun, May 14, 2017 at 04:58:18AM -0700, Tomas Flouri wrote:

thank you for all the information, and I'm sorry for the delay. I'm working to resolve this issue now, and will update the master + release tag tonight.

Thanks. :-)

I have some questions regarding the run-time hardware detection, and I'd like your opinion about it.

In the libpll version you packed, detection of hardware features (SSE, AVX, AVX2) is done at compile-time by autotools, which instructs that the corresponding files are compiled if such a feature is available. Once the library is built, the application programmer can use the API and call optimized functions (e.g. AVX,SSE) that were available at compile-time, but can also call the equivalent non-optimized functions which are always available.

Now, assume we compile libpll on machine A (has features F1 but not F2), and want to distribute it to machines B (has neither F1 nor F2), and machine C (has both F1 and F2). Machine B users will be able to execute the F1 optimized functions, but execution will fail because F1 is not present on B. Machine C users will be able to execute F1 functions without a problem, but will not be able to execute F2 functions even though their hardware supports it, because libpll was not built with F2 enabled.

Yes. That's the problem. As far as I know - and I'm not an hardware expert at all - the technical solution is function multiversioning. Please reread the discussion on the Debian Med mailing list where also your colleagues were involved at https://lists.alioth.debian.org/pipermail/debian-med-packaging/2017-March/051199.html

If I understand correctly your suggestion above, shall I disable the autotools detection of hardware features, and allow compilation with all hardware features enabled? That would require at least GCC 4.6.0 (March 25, 2011) since this is the first version with AVX compatibility. I would then add run-time checks at each optimized function, to see whether the required hardware feature is available, and if it is not, it would fall-back to the non-optimized functions. Would something like this work?

I think we can perfectly safely assume that users have higher GCC versions than 4.6.0. I would not even mind to bind people to GCC 6.x if the subset of 4.8 (see thread I have linked above would not be sufficient). If you have more detailed questions about function multiversioning please join the discussion of this thread since on this list are way more experienced experts than me.

Kind regards, Andreas.

xflouris commented 7 years ago

Hi Andreas, I believe that multi-versioning would be suitable for the old (and basically abandoned) version of libpll, but not for the new version you have recently packaged. That thread considers the old version, which used to compile multiple objects, each having the same function prototype/headers, but each object would use different hardware optimizations. In the new libpll version, the situation is different. We have functions like this:

foo_sse(parameters)
{
  /* do operation X using SSE */
}

foo_avx(parameters)
{
  /* do operation X using AVX*/
}

foo(parameters,optimization)
{
  if (optimization == SSE)
    foo_sse(parameters)
  else if (optimization == AVX)
    foo_avx(parameters);
  else
   do operation X without hardware optimizations
}

foo_sse/foo_avx are built based on decisions at compile-time (at least for now). The application programmer is advised to use foo() and not foo_sse/foo_avx directly, which in the long term, I will render as hidden symbols. For now my plan, as I described in my previous post, would be to change foo into the following:

foo(parameters,optimization)
{
  if (optimization == SSE && sse_present)
    foo_sse(parameters)
  else if (optimization == AVX && avx_present)
    foo_avx(parameters);
  else
    /* do operation X without any hardware optimizations */
}

where sse_present and avx_present would be filled at run-time when loading libpll by executing the cpuid instruction. That way I could disable compile-time detections, and basically compile libpll with all hardware features. Runt-ime checks would forbid running hardware optimizations on non-supported hardware.

I can forward this discussion on that thread, if you feel it won't be off-topic.

Best, Tomas

tillea commented 7 years ago

Hi Tomas,

On Sun, May 14, 2017 at 07:03:03AM -0700, Tomas Flouri wrote:

I believe that multi-versioning would be suitable for the old (and basically abandoned) version of libpll, but not for the new version you have recently packaged. That thread considers the old version, which used to compile multiple objects, each having the same function prototype/headers, but each object would use different hardware optimizations. In the new libpll version, the situation is different. We have functions like this:

I simply trust your way deeper insight than I have. We just need to make sure that runtime optimisation is done rather than build time.

where sse_present and avx_present would be filled at run-time when loading libpll by executing the cpuid instruction. That way I could disable compile-time detections, and basically compile libpll with all hardware features. Runt-ime checks would forbid running hardware optimizations on non-supported hardware.

I can forward this discussion on that thread, if you feel it won't be off-topic.

Its definitely on-topic and as I tried to express I'm not competent in this field.

Thanks a lot for supporting the packaging attempt, Andreas.

xflouris commented 7 years ago

Hi Andreas, I've now added run-time detection of hardware features, and fall-back to non hardware-optimized functions in case there is an attempt to call an optimized function for which the instruction set is not available on the processor.

Best regards, Tomas

xflouris commented 7 years ago

finished