icl-utk-edu / lapackpp

LAPACK++ is a C++ wrapper around CPU and GPU LAPACK and LAPACK-like linear algebra libraries, developed as part of the SLATE project.
https://icl.utk.edu/slate/
BSD 3-Clause "New" or "Revised" License
51 stars 14 forks source link

Support for new Accelerate interface on macOS 13.3 #43

Open rileyjmurray opened 1 year ago

rileyjmurray commented 1 year ago

The version of Accelerate that ships with macOS 13.3 has the option of providing ILP_64 and LAPACK 3.9.1:

https://developer.apple.com/documentation/macos-release-notes/macos-13_3-release-notes#New-Features

This new version of Accelerate is a major upgrade. Older versions only supported LAPACK 3.2. @TeachRaccooon (Max) and I need the more up-to-date LAPACK 3.9.1 interface to get randomized column-pivoted QR working on Apple Silicon.

@mgates3 having this fix in LAPACK++ would actually make a big difference for us. Let me know if there's anything I can do to help.

mgates3 commented 1 year ago

Good to hear that they updated. Hopefully that means they will keep up-to-date with LAPACK versions now.

Confusing, because in the LAPACK configure step (make config), querying Accelerate (ilaver) still lists the LAPACK version as 3.2.1 instead of 3.9.

LAPACK version                                                           yes (3.02.01)

Will need some more investigation to understand what's going on.

rileyjmurray commented 1 year ago

I think the issue might be in missing the #define ACCELERATE_NEW_LAPACK declaration at the correct place

mgates3 commented 1 year ago

I changed the config to define it on the command line. That shouldn't really matter — it calls ilaver just fine, but ilaver returns 3.2.1. From config/log.txt:

LAPACK version
>>> g++ -std=c++17 -O2 -MMD -Wall -Wno-unused-local-typedefs -Wno-unused-function -DFORTRAN_ADD_ -DACCELERATE_NEW_LAPACK -fopenmp -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -c -o config/lapack_version.o config/lapack_version.cc
exit status = 0
>>> g++ config/lapack_version.o -o config/lapack_version -fopenmp -framework Accelerate
exit status = 0
>>> ./config/lapack_version
LAPACK_VERSION=3.02.01
exit status = 0
yes (3.02.01)

I'm on macOS Ventura 13.4.1.

Edit: Fixed line that was truncated. Old line:

>>> g++ -std=c++17 -O2 -MMD -Wall -Wno-unused-local-typedefs -Wno-unused-function -DACCELERATE_NEW_LAPACK -fopenmp -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/M>
rileyjmurray commented 1 year ago

Could the issue stem from different ways of including Accelerate? I ran a search for Accelerate.h in this repo and only found two hits: https://github.com/search?q=repo%3Aicl-utk-edu%2Flapackpp%20Accelerate.h&type=code. Both of the hits include comments about "official" vs practical ways of including BLAS/LAPACK on Accelerate platforms.

rileyjmurray commented 1 year ago

@mgates3 I think there's something to my hypothesis. I tracked down Accelerate.h and all it did was define __ACCELERATE__ for the preprocessor, include vecLib.h, and include a header for an image processing library. Here's where I found the vecLib folder on my machine

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks/vecLib.framework

Here are the contents of that folder, just to help you get oriented.

├── Headers -> Versions/Current/Headers
├── Versions
│   ├── A
│   │   ├── Headers
│   │   │   ├── ... four folders we don't care about ... (BNNS, LinearAlgebra, Quadrature, Sparse)
│   │   │   ├── blas_new.h
│   │   │   ├── cblas.h
│   │   │   ├── cblas_new.h
│   │   │   ├── clapack.h
│   │   │   ├── fortran_blas.h
│   │   │   ├── lapack.h
│   │   │   ├── lapack_types.h
│   │   │   ├── lapack_version.h
│   │   │   ├── vBasicOps.h
│   │   │   ├── vBigNum.h
│   │   │   ├── vDSP.h
│   │   │   ├── vDSP_translate.h
│   │   │   ├── vForce.h
│   │   │   ├── vecLib.h
│   │   │   ├── vecLibTypes.h
│   │   │   ├── vectorOps.h
│   │   │   └── vfp.h
│   │   ├── libBLAS.tbd
│   │   ├── libBNNS.tbd
│   │   ├── libLAPACK.tbd
│   │   ├── libLinearAlgebra.tbd
│   │   ├── libQuadrature.tbd
│   │   ├── libSparse.tbd
│   │   ├── libSparseBLAS.tbd
│   │   ├── libvDSP.tbd
│   │   ├── libvMisc.tbd
│   │   └── vecLib.tbd
│   └── Current -> A
├── libvDSP.tbd -> Versions/Current/libvDSP.tbd
├── libvMisc.tbd -> Versions/Current/libvMisc.tbd
└── vecLib.tbd -> Versions/Current/vecLib.tbd

The relevant lines in vecLib.h are as follows

#if defined(ACCELERATE_NEW_LAPACK)

#include <vecLib/blas_new.h>
#include <vecLib/cblas_new.h>
#include <vecLib/lapack.h>

// Prevent nested inclusion of old headers if we are using the new ones
#define CBLAS_H
#define __CLAPACK_H

#else

#include <vecLib/fortran_blas.h>

#ifndef CBLAS_H
#include <vecLib/cblas.h>
#endif

#ifndef __CLAPACK_H
#include <vecLib/clapack.h>
#endif

#endif // #if defined(ACCELERATE_NEW_LAPACK)

I looked at some of blaspp/lapackpp's cmake files and their logic for macOS systems. They try to find cblas.h rather than vecLib.h. While that's reasonable, the investigation above shows that including cblas.h gets the intended behavior for the old LAPACK interface. If we look at blas_new.h and cblas_new.h we'll see lines like

#include "lapack_types.h"
#include "lapack_version.h"

I tried looking at the latter file, but I don't really understand what it's doing. It defines a preprocessor macro called __LAPACK_ALIAS(arg) that somehow influences function declarations in lapack.h. For example, here's what happens in lapack.h for GEMQRT

void
dgemqrt_(
  const char * _Nonnull side,
  const char * _Nonnull trans,
  const __LAPACK_int * _Nonnull m,
  const __LAPACK_int * _Nonnull n,
  const __LAPACK_int * _Nonnull k,
  const __LAPACK_int * _Nonnull nb,
  const double * _Nullable v,
  const __LAPACK_int * _Nonnull ldv,
  const double * _Nullable t,
  const __LAPACK_int * _Nonnull ldt,
  double * _Nullable c,
  const __LAPACK_int * _Nonnull ldc,
  double * _Nonnull work,
  __LAPACK_int * _Nonnull info)
__LAPACK_ALIAS(dgemqrt)
API_AVAILABLE(macos(13.3), ios(16.4), watchos(9.4), tvos(16.4));
mgates3 commented 1 year ago

Thanks for the investigation.

BLAS++ and LAPACK++ largely ignore the system headers. They define their own Fortran prototypes, since most BLAS / LAPACK libraries don't even provide Fortran prototypes, and might get them wrong with respect to const if they do. Of course the exception is when we call cblas or lapacke, which is only in the testers.

Mostly we probe the library to figure out behavior, rather than relying on Fortran prototypes that might not even exist. So if the BLAS / LAPACK library changes its runtime behavior, we should detect that. Yes, I'm very much wondering if they failed to update ilaver, which makes things really complicated because then we would have to determine the actual LAPACK version based on what functions exist -- something I wouldn't look forward to maintaining -- rather than a simple query.

Apple Accelerate always gives lots of headaches. One is including the header file. Per the comments in blaspp/config/cblas.cc, if configure can find cblas.h, it will use that. This is the officially sanctioned method of using CBLAS (Appendix B.2.8), so if Apple doesn't support it, that's their bad. Nonetheless, we also check for Accelerate.h, if using cblas.h didn't work, but I previously had issues with Accelerate.h.

(Intel MKL and IBM ESSL also fail to have the cblas.h header, also requiring complicated #if logic.)

If Apple's updates require going through cblas / lapacke rather than the Fortran interface, that would be quite unfortunate and really complicate BLAS++ / LAPACK++.

rileyjmurray commented 1 year ago

It sounds like it's worth getting some Accelerate developers involved in this discussion. I have the contact info for the guy who oversees Accelerate's development. Should I reach out to him?

rileyjmurray commented 12 months ago

I changed the config to define it on the command line. That shouldn't really matter — it calls ilaver just fine, but ilaver returns 3.2.1. From config/log.txt:

LAPACK version
>>> g++ -std=c++17 -O2 -MMD -Wall -Wno-unused-local-typedefs -Wno-unused-function -DACCELERATE_NEW_LAPACK -fopenmp -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/M>
exit status = 0
>>> g++ config/lapack_version.o -o config/lapack_version -fopenmp -framework Accelerate
exit status = 0
>>> ./config/lapack_version
LAPACK_VERSION=3.02.01
exit status = 0
yes (3.02.01)

I'm on macOS Ventura 13.4.1.

@mgates3, can you push the code you changed to some branch, so that I can replicate what you did?

EDIT: I might have misunderstood the nature of your change. I tried compiling lapack_version.cc with the following line

clang++ lapack_version.cc -std=c++17 -O2 -MMD -Wall -Wno-unused-local-typedefs -Wno-unused-function -DACCELERATE_NEW_LAPACK -DBLAS_FORTRAN_ADD_ -fopenmp -framework Accelerate

Then running ./a.out produces the output you report. However, if I modify lapack_version.cc to include Accelerate/Accelerate.h, then it prints out version 3.9.01.

mgates3 commented 12 months ago

@rileyjmurray I fixed my comment from Aug 31. One of the lines was truncated. You can just copy those two g++ lines to see the result. It prints 3.02.01.

I tried including Accelerate/Accelerate.h as you suggest above, but still get 3.02.01. See below. (Also, the Accelerate.h header has errors with g++, but works with clang++.)

I talked with Heiko briefly at the SparseBLAS workshop and need to follow up with him about what they did to get the new Accelerate working.

lapack_version.cc

#include <stdio.h>

#include "config.h"

#define LAPACK_ilaver FORTRAN_NAME( ilaver, ILAVER )

#if 1
    #pragma message "Accelerate.h"
    #include <Accelerate/Accelerate.h>
#else
    #pragma message "self-defined"
    #ifdef __cplusplus
    extern "C"
    #endif
    void LAPACK_ilaver( lapack_int* major, lapack_int* minor, lapack_int* patch );
#endif

int main( int argc, char** argv )
{
    using llong = long long;
    lapack_int major, minor, patch;
    LAPACK_ilaver( &major, &minor, &patch );
    printf( "LAPACK_VERSION=%lld.%02lld.%02lld\n",
            llong( major ), llong( minor ), llong( patch ) );
    return 0;
}

Compiled as:

pangolin lapackpp/config> clang++ -o lapack_version lapack_version.cc -std=c++17 -O2 -MMD -Wall -Wno-unused-local-typedefs -Wno-unused-function -DACCELERATE_NEW_LAPACK -DBLAS_FORTRAN_ADD_ -framework Accelerate
lapack_version.cc:13:13: warning: Accelerate.h [-W#pragma-messages]
    #pragma message "Accelerate.h"
            ^
1 warning generated.
pangolin lapackpp/config> ./lapack_version 
LAPACK_VERSION=3.02.01
rileyjmurray commented 12 months ago

Thanks for the update, @mgates3. I get different results from what you observe.

Here's what I see with clang++ and the ACCELERATE_NEW_LAPACK definition.

clang++ -o lapack_version lapack_version.cc -std=c++17 -O2 -MMD -Wall -Wno-unused-local-typedefs -Wno-unused-function -DACCELERATE_NEW_LAPACK -DBLAS_FORTRAN_ADD_ -framework Accelerate

lapack_version.cc:13:13: warning: Accelerate.h [-W#pragma-messages]
   13 |     #pragma message "Accelerate.h"
      |             ^
lapack_version.cc:27:5: warning: 'ilaver_' is only available on macOS 13.3 or newer [-Wunguarded-availability-new]
   27 |     LAPACK_ilaver( &major, &minor, &patch );
      |     ^~~~~~~~~~~~~
lapack_version.cc:10:23: note: expanded from macro 'LAPACK_ilaver'
   10 | #define LAPACK_ilaver FORTRAN_NAME( ilaver, ILAVER )
      |                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./config.h:17:42: note: expanded from macro 'FORTRAN_NAME'
   17 |     #define FORTRAN_NAME( lower, UPPER ) lower ## _
      |                                          ^~~~~~~~~~
<scratch space>:13:1: note: expanded from here
   13 | ilaver_
      | ^~~~~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/vecLib.framework/Headers/lapack.h:285:1: note: 'ilaver_' has been marked as being introduced in macOS 13.3 here, but the deployment target is macOS 13.0.0
  285 | ilaver_(
      | ^
lapack_version.cc:27:5: note: enclose 'ilaver_' in a __builtin_available check to silence this warning
   27 |     LAPACK_ilaver( &major, &minor, &patch );
      |     ^~~~~~~~~~~~~                           
lapack_version.cc:10:23: note: expanded from macro 'LAPACK_ilaver'
   10 | #define LAPACK_ilaver FORTRAN_NAME( ilaver, ILAVER )
      |                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./config.h:17:42: note: expanded from macro 'FORTRAN_NAME'
   17 |     #define FORTRAN_NAME( lower, UPPER ) lower ## _
      |                                          ^~~~~~~~~~
<scratch space>:13:1: note: expanded from here
   13 | ilaver_
      | ^~~~~~~
2 warnings generated.

Upon running lapack_version, I see output LAPACK_VERSION=3.09.01.

If I use clang++ with no special defines, then I see the following.

lapack_version.cc:13:13: warning: Accelerate.h [-W#pragma-messages]
   13 |     #pragma message "Accelerate.h"
      |             ^
1 warning generated.

Running lapack_version prints LAPACK_VERSION=3.02.01 in this case.

System info

macOS 13.6.2 (22G320)

Compiler

clang++ -v
Homebrew clang version 17.0.5
Target: arm64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /opt/homebrew/opt/llvm/bin

Developer toolkit

xcode-select -v
xcode-select version 2397.

Note: my version of the macOS developer toolkit actually overrides g++ with the Apple version of clang. I verified that this version of clang produces the same results as I reported above (based on Homebrew clang++).

g++ -v
Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
mgates3 commented 12 months ago

Ah! Success! I noticed clang++ was including /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks/Accelerate.framework/Versions/A/Headers/Accelerate.h whereas g++ was inluding /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/Accelerate.framework/Headers/Accelerate.h which checks ACCELERATE_NEW_LAPACK. Check the path, then set it:

pangolin lapackpp/config> xcode-select -p
/Applications/Xcode.app/Contents/Developer

pangolin lapackpp/config> sudo xcode-select -s /Library/Developer/CommandLineTools

That fixes the problem. Setting -mmacosx-version-min=13.3 fixes some errors that you saw.

pangolin lapackpp/config> clang++ -o lapack_version lapack_version.cc -std=c++17 -O2 -MMD -Wall -Wno-unused-local-typedefs -Wno-unused-function -DACCELERATE_NEW_LAPACK -DBLAS_FORTRAN_ADD_ -framework Accelerate -mmacosx-version-min=13.3
lapack_version.cc:11:13: warning: include Accelerate.h [-W#pragma-messages]
    #pragma message "include Accelerate.h"
            ^
1 warning generated.
pangolin lapackpp/config> ./lapack_version 
LAPACK_VERSION=3.09.01

Now we would have to include Apple's Accelerate.h header instead of our own fortran.h header. Hopefully that's all there is to it.

Note my g++ is installed via HomeBrew, not Apple's wrapper around clang. g++ experiences some errors:

pangolin lapackpp/config> g++ -o lapack_version lapack_version.cc -std=c++17 -O2 -MMD -Wall -Wno-unused-local-typedefs -Wno-unused-function -DACCELERATE_NEW_LAPACK -DBLAS_FORTRAN_ADD_ -framework Accelerate -mmacosx-version-min=13.3
lapack_version.cc:11:21: note: '#pragma message: include Accelerate.h'
   11 |     #pragma message "include Accelerate.h"
      |                     ^~~~~~~~~~~~~~~~~~~~~~
In file included from lapack_version.cc:12:
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/Accelerate.framework/Headers/Accelerate.h:19:17: note: '#pragma message: Accelerate.h'
   19 | #pragma message "Accelerate.h"
      |                 ^~~~~~~~~~~~~~
In file included from /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/Accelerate.framework/Headers/../Frameworks/vecLib.framework/Headers/vecLib.h:25,
                 from /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/Accelerate.framework/Headers/Accelerate.h:22:
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/vecLib.framework/Headers/vBasicOps.h: In function 'vUInt32 vU64Sub(vUInt32, vUInt32)':
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/vecLib.framework/Headers/vBasicOps.h:658:103: note: use '-flax-vector-conversions' to permit conversions between vectors with differing element types or numbers of subparts
  658 |   vUInt32   __vbasicops_vB) { return vsubq_u64( (uint64x2_t)__vbasicops_vA, (uint64x2_t)__vbasicops_vB); }
      |                                                                                                       ^
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/vecLib.framework/Headers/vBasicOps.h:658:47: error: cannot convert 'uint64x2_t' to 'vUInt32' in return
  658 |   vUInt32   __vbasicops_vB) { return vsubq_u64( (uint64x2_t)__vbasicops_vA, (uint64x2_t)__vbasicops_vB); }
      |                                      ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                               |
      |                                               uint64x2_t
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/vecLib.framework/Headers/vBasicOps.h: In function 'vUInt32 vS64Sub(vUInt32, vUInt32)':
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/vecLib.framework/Headers/vBasicOps.h:733:47: error: cannot convert 'int64x2_t' to 'vUInt32' in return
  733 |   vUInt32   __vbasicops_vB) { return vsubq_s64( (int64x2_t)__vbasicops_vA, (int64x2_t)__vbasicops_vB); }
      |                                      ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                               |
      |                                               int64x2_t
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/vecLib.framework/Headers/vBasicOps.h: In function 'vUInt32 vU64Add(vUInt32, vUInt32)':
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/vecLib.framework/Headers/vBasicOps.h:805:47: error: cannot convert 'uint64x2_t' to 'vUInt32' in return
  805 |   vUInt32   __vbasicops_vB) { return vaddq_u64( (uint64x2_t)__vbasicops_vA, (uint64x2_t)__vbasicops_vB); }
      |                                      ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                               |
      |                                               uint64x2_t
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/vecLib.framework/Headers/vBasicOps.h: In function 'vUInt32 vS64Add(vSInt32, vSInt32)':
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/vecLib.framework/Headers/vBasicOps.h:875:47: error: cannot convert 'int64x2_t' to 'vUInt32' in return
  875 |   vSInt32   __vbasicops_vB) { return vaddq_s64( (int64x2_t)__vbasicops_vA, (int64x2_t)__vbasicops_vB); }
      |                                      ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                               |
      |                                               int64x2_t
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/Accelerate.framework/Headers/../Frameworks/vecLib.framework/Headers/vecLib.h: At global scope:
/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/System/Library/Frameworks/Accelerate.framework/Headers/../Frameworks/vecLib.framework/Headers/vecLib.h:52:17: note: '#pragma message: vecLib.h new'
   52 | #pragma message "vecLib.h new"
      |                 ^~~~~~~~~~~~~~
rileyjmurray commented 12 months ago

Hooray, progress!

Do you think you'll be able to make the necessary changes to BLAS++ and LAPACK++ in the near future?

mgates3 commented 12 months ago

I'm looking into it. Unfortunately it requires a bit more changes to the fortran.h header than I hoped, since we have to use Apple's decorated prototypes instead of our own, but need our own macros:

#define BLAS_ztrsm BLAS_FORTRAN_NAME( ztrsm, ZTRSM )

and so on. But now it seems just a matter of getting the mechanics right.

rileyjmurray commented 12 months ago

But now it seems just a matter of getting the mechanics right.

Happy to hear that. No worries if it takes a few weeks for you to get to this. @TeachRaccooon and I can put this feature to work as soon as it's ready, but we can keep ourselves busy until then.

Please let us know if there's anything we might be able to do to help.

rileyjmurray commented 11 months ago

Hi @mgates3, just checking in on this. Any chance you can get to it this week?

mgates3 commented 11 months ago

You can see my current work in BLAS++ PR https://github.com/icl-utk-edu/blaspp/pull/74 Still a ways to go to work, and then do similar work on LAPACK++.

rileyjmurray commented 11 months ago

Progress of any kind is awesome. Thanks so much! I’ll keep an eye on that PR, and look out for a similar PR on this repo.

Riley

On Wed, Dec 20, 2023 at 4:23 PM Mark Gates @.***> wrote:

You can see my current work in BLAS++ PR icl-utk-edu/blaspp#74 https://github.com/icl-utk-edu/blaspp/pull/74 Still a ways to go to work, and then do similar work on LAPACK++.

— Reply to this email directly, view it on GitHub https://github.com/icl-utk-edu/lapackpp/issues/43#issuecomment-1865156325, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRLIFEJAW4JOOMTPTRQG7DYKNJLJAVCNFSM6AAAAAA4GGXLNSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRVGE2TMMZSGU . You are receiving this because you were mentioned.Message ID: @.***>