msummersgill / RfCWT

R package porting the fast Continuous Wavelet Transform (fCWT)
0 stars 0 forks source link

Intel MKL Compatibility #1

Open msummersgill opened 1 year ago

msummersgill commented 1 year ago

Building the package with R using Intel MKL libraries for BLAS/LAPACK appears to cause issues with reading and writing FFTW plans, though it's not clear to me why.

Observed output was below, and all the nxxxx_tx.wis files were conspicuously existent, but empty.

RfCWT::fCWT(Length_1e4, f0 = 1,f1 = 101,nthreads =  1L,fn =  300,fs = 100,optimize = opt)
Optimization schemes T:1 for N: 2048 have been calculated. Next time you use fCWT it will automatically choose the right optimization scheme based on number of threads and signal length.
Optimization schemes T:1 for N: 4096 have been calculated. Next time you use fCWT it will automatically choose the right optimization scheme based on number of threads and signal length.
Optimization schemes T:1 for N: 8192 have been calculated. Next time you use fCWT it will automatically choose the right optimization scheme based on number of threads and signal length.
Optimization schemes T:1 for N: 16384 have been calculated. Next time you use fCWT it will automatically choose the right optimization scheme based on number of threads and signal length.
WARNING: Optimization scheme 'n16384_t1.wis' was not found, fallback to calculation without optimization.

Using update-alternatives, I switched back and forth between Intel libmkl_rt and GNU libblas/liblapack multiple times to confirm this was the issue. A bread-crumbs on Stack Overflow indicated I may not be the first to encounter this behavior.

sudo update-alternatives --config libblas.so.3-x86_64-linux-gnu

There are 5 choices for the alternative libblas.so.3-x86_64-linux-gnu (providing /usr/lib/x86_64-linux-gnu/libblas.so.3).

  Selection    Path                                                     Priority   Status
------------------------------------------------------------
  0            /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3   100       auto mode
  1            /opt/intel/mkl/lib/intel64/libmkl_rt.so                   50        manual mode
  2            /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3              35        manual mode
* 3            /usr/lib/x86_64-linux-gnu/blas/libblas.so.3               10        manual mode
  4            /usr/lib/x86_64-linux-gnu/libmkl_rt.so                    1         manual mode
  5            /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3   100       manual mode

sudo update-alternatives --config liblapack.so.3-x86_64-linux-gnu

There are 5 choices for the alternative liblapack.so.3-x86_64-linux-gnu (providing /usr/lib/x86_64-linux-gnu/liblapack.so.3).

  Selection    Path                                                       Priority   Status
------------------------------------------------------------
  0            /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3   100       auto mode
  1            /opt/intel/mkl/lib/intel64/libmkl_rt.so                     50        manual mode
  2            /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3              35        manual mode
* 3            /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3             10        manual mode
  4            /usr/lib/x86_64-linux-gnu/libmkl_rt.so                      1         manual mode
  5            /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3   100       manual mode

In spite of finding a solution, my excitement were quickly damped by finding that the speed-up associated with using previously generated plans was negated. There are some great examples demonstrating the speed-up of using these optimized math kernel libraries out there that also give some good guidance for installation. Testing between standard BLAS/LAPACK, and the Intel, and Atlas alternatives, it seems that even without planning capabilities, Intel MKL outperforms.

Unit: seconds
                expr   MKL (No Plan)      Atlas   BLAS/LAPACk
   10k-300 1 Threads            0.09       0.26          0.26
  100k-300 1 Threads            0.91       2.87          3.13
  10k-3000 1 Threads            1.46       3.02          3.07
 100k-3000 1 Threads            9.06      29.14         31.60
   10k-300 8 Threads            0.77       0.72          0.71
  100k-300 8 Threads            0.75       0.99          1.01
  10k-3000 8 Threads            1.77       1.03          1.05
 100k-3000 8 Threads            7.32       9.68          9.81

According to Intel MKL FFTW3 Documentation, the following functions are empty:

For example:

// /opt/intel/mkl/interfaces/fftw3xf/wrappers/fftw_import_wisdom_from_filename.c
/*******************************************************************************
* Copyright 2015-2019 Intel Corporation.
*
* This software and the related documents are Intel copyrighted  materials,  and
* your use of  them is  governed by the  express license  under which  they were
* provided to you (License).  Unless the License provides otherwise, you may not
* use, modify, copy, publish, distribute,  disclose or transmit this software or
* the related documents without Intel's prior written permission.
*
* This software and the related documents  are provided as  is,  with no express
* or implied  warranties,  other  than those  that are  expressly stated  in the
* License.
*******************************************************************************/

/*
 *
 * fftw_import_wisdom_from_filename - FFTW3 wrapper to Intel(R) MKL.
 *
 ******************************************************************************
 */

#include "fftw3_mkl.h"

int
fftw_import_wisdom_from_filename(const char *filename)
{
    UNUSED(filename);
    return 0;
}

This affects the behavior of FCWT::load_FFT_optimization_plan() and FCWT::create_FFT_optimization_plan(). Current work around is to append compiler flag -DMKL to PKG_CXXFLAGS to source/MakeVars.

CXX_STD = CXX17
PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) -fPIC -mavx -O3 -DMKL
PKG_LIBS = $(SHLIB_OPENMP_CFLAGS) -lfftw3f -lfftw3f_omp

It should be possible to automate this as part of the package compilation process to avoid requiring hand-edits to the make-file by users.

See also: