wellposed / hblas

haskell bindings for blas and lapack
www.wellposed.com
BSD 3-Clause "New" or "Revised" License
49 stars 19 forks source link

gemv result #39

Closed tmcdonell closed 6 years ago

tmcdonell commented 7 years ago

I'm not getting the result I expect for [d,s]gemv. I only just started using the library though so I could be using it incorrectly?

#include <mkl.h>
#include <stdio.h>

int main()
{
  double A[] = { 1, 2, 3
               , 4, 5, 6 };
  double x[] = { 10, 20, 30 };
  double y[2];

  cblas_dgemv(CblasRowMajor, CblasNoTrans, 2, 3, 1.0, A, 3, x, 1, 0, y, 1);

  printf("y = [%lf, %lf]\n", y[0], y[1]);

  return 0;
}
> icc -I$MKLROOT/include $MKLROOT/lib/libmkl_intel_lp64.a $MKLROOT/lib/libmkl_sequential.a $MKLROOT/lib/libmkl_core.a -lpthread -lm -ldl test.c
> ./a.out
y = [140.000000, 320.000000]
import Numerical.HBLAS.BLAS.Level2
import Numerical.HBLAS.MatrixTypes

import qualified Data.Vector.Storable         as S
import qualified Data.Vector.Storable.Mutable as SM

main :: IO ()
main = do
  let m = S.fromList [1,2,3, 4,5,6]
      x = S.fromList [10,20,30]

  m' <- S.unsafeThaw m
  x' <- S.unsafeThaw x
  y' <- SM.new 2

  let mat   = MutableDenseMatrix SRow 3 2 3 m'
      xvec  = MutableDenseVector SDirect 3 1 x'
      yvec  = MutableDenseVector SDirect 2 1 y'

  dgemv NoTranspose 1 0 mat xvec yvec

  y  <- S.unsafeFreeze y'
  print y
ghci> :main
[50.0,140.0]
cartazio commented 7 years ago

this is a def pants on fire bug and i can reproduce it on my mac in ghci

import Numerical.HBLAS.BLAS.Level2
import Numerical.HBLAS.MatrixTypes
mat  <- generateMutableDenseMatrix (SRow)  (3,2)  (\(colx,rowy) -> ((fromIntegral rowy * 3) + fromIntegral colx ) :: Double  )
right <- generateMutableDenseVector 3 (\i  ->  fromIntegral i * 10 + 10 )
res  <- generateMutableDenseVector 2  (\_ -> (0.0 ))
dgemv NoTranspose 1.0 0 mat right res
resList <-mutableVectorToList $ _bufferMutDenseVector res
print resList
cartazio commented 7 years ago

the bug must lie in https://github.com/wellposed/hblas/blob/master/src/Numerical/HBLAS/BLAS/Internal/Level2.hs#L150-L179

which currently is


{-# NOINLINE gemvAbstraction #-}
gemvAbstraction :: (SM.Storable el, PrimMonad m)
                => String
                -> GemvFunFFI scale el
                -> GemvFunFFI scale el
                -> (el -> (scale -> m ())-> m ())
                -> forall orient . GemvFun el orient (PrimState m) m
gemvAbstraction gemvName gemvSafeFFI gemvUnsafeFFI constHandler = gemv
  where
    shouldCallFast :: Int -> Int  -> Bool
    shouldCallFast a b = flopsThreshold >= gemvComplexity a b
    gemv tr alpha beta
      (MutableDenseMatrix ornta ax ay astride abuff)
      (MutableDenseVector _ bdim bstride bbuff)
      (MutableDenseVector _ cdim cstride cbuff)
        | isBadGemv tr ax ay bdim cdim =  error $! "Bad dimension args to GEMV: ax ay xdim ydim: " ++ show [ax, ay, bdim, cdim]
        | SM.overlaps abuff cbuff || SM.overlaps bbuff cbuff =
            error $! "The read and write inputs for: " ++ gemvName ++ " overlap. This is a programmer error. Please fix."
        | otherwise = call
            where
              (newx,newy) = coordSwapper tr (ax,ay)
              call = unsafeWithPrim abuff $ \ap ->
                     unsafeWithPrim bbuff $ \bp ->
                     unsafeWithPrim cbuff $ \cp ->
                     constHandler alpha $ \alphaPtr ->
                     constHandler beta  $ \betaPtr  ->
                       unsafePrimToPrim $! (if shouldCallFast newx newy  then gemvUnsafeFFI else gemvSafeFFI)
                         (encodeNiceOrder ornta) (encodeFFITranspose tr)
                         (fromIntegral newx) (fromIntegral newy) alphaPtr ap (fromIntegral astride) bp
                         (fromIntegral bstride) betaPtr cp (fromIntegral cstride)
cartazio commented 7 years ago

first of all it looks like the coord swap introduced by newx and newy isn't needed / likely a bug, because cblasFoo will handle the implicit transpose i believe? (for fortran blas rather than `cblas` wrappers this would be correct... ish )

likewise, it looks like (fromIntegral newx) (fromIntegral newy) are in the wrong order, since ay / newy == 1 less than the number of rows, and ax / newx == 1 less than the number of columns

it looks like the work behind this wrapper (and possibly some others in level2) didn't closely follow the meanings in the cblas docs

void cblas_dgemv (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const double alpha, const double *a, const MKL_INT lda, const double *x, const MKL_INT incx, const double beta, double *y, const MKL_INT incy);

Description
The ?gemv routines perform a matrix-vector operation defined as

y := alpha*A*x + beta*y,

or

y := alpha*A'*x + beta*y,

or

y := alpha*conjg(A')*x + beta*y,

where:

alpha and beta are scalars,

x and y are vectors,

A is an m-by-n matrix.

Input Parameters
Layout
Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).

trans
Specifies the operation:

if trans=CblasNoTrans, then y := alpha*A*x + beta*y;

if trans=CblasTrans, then y := alpha*A'*x + beta*y;

if trans=CblasConjTrans, then y := alpha *conjg(A')*x + beta*y.

m
Specifies the number of rows of the matrix A. The value of m must be at least zero.

n
Specifies the number of columns of the matrix A. The value of n must be at least zero.

alpha
Specifies the scalar alpha.

a
Array, size lda*k.

For Layout = CblasColMajor, k is n. Before entry, the leading m-by-n part of the array a must contain the matrix A.

For Layout = CblasRowMajor, k is m. Before entry, the leading n-by-m part of the array a must contain the matrix A.

lda
Specifies the leading dimension of a as declared in the calling (sub)program.

For Layout = CblasColMajor, the value of lda must be at least max(1, m).

For Layout = CblasRowMajor, the value of lda must be at least max(1, n).

x
Array, size at least (1+(n-1)*abs(incx)) when trans=CblasNoTrans and at least (1+(m - 1)*abs(incx)) otherwise. Before entry, the incremented array x must contain the vector x.

incx
Specifies the increment for the elements of x.

The value of incx must not be zero.

beta
Specifies the scalar beta. When beta is set to zero, then y need not be set on input.

y
Array, size at least (1 +(m - 1)*abs(incy)) when trans=CblasNoTrans and at least (1 +(n - 1)*abs(incy)) otherwise. Before entry with non-zero beta, the incremented array y must contain the vector y.

incy
Specifies the increment for the elements of y.

The value of incy must not be zero.

Output Parameters
y
Updated vector y.
cartazio commented 7 years ago

@tmcdonell hrmmm, i know where the bug must live, but i'm not seeing it at the moment, one thing that might help me realize what i'm doing wrong would be if you could take a moment and help me write out the direct invocation that would use the ffi interface version?

specifically cblas_sgemv_unsafe or *safe?

i suspect that some of the other level2 routines may have the same bug .. :(

cartazio commented 7 years ago

I've one or two quick ideas I'll see about trying this evening, else I'll get through it later this week. (Have some start of week deadlines to juggle )

On Sun, Jun 4, 2017 at 12:41 AM Trevor L. McDonell notifications@github.com wrote:

I'm not getting the result I expect for [d,s]gemv. I only just started using the library though so I could be using it incorrectly?

include

include

int main() { double A[] = { 1, 2, 3 , 4, 5, 6 }; double x[] = { 10, 20, 30 }; double y[2];

cblas_dgemv(CblasRowMajor, CblasNoTrans, 2, 3, 1.0, A, 3, x, 1, 0, y, 1);

printf("y = [%lf, %lf]\n", y[0], y[1]);

return 0; }

icc -I$MKLROOT/include $MKLROOT/lib/libmkl_intel_lp64.a $MKLROOT/lib/libmkl_sequential.a $MKLROOT/lib/libmkl_core.a -lpthread -lm -ldl test.c ./a.out y = [140.000000, 320.000000]

import Numerical.HBLAS.BLAS.Level2import Numerical.HBLAS.MatrixTypes import qualified Data.Vector.Storable as Simport qualified Data.Vector.Storable.Mutable as SM

main :: IO () main = do let m = S.fromList [1,2,3, 4,5,6] x = S.fromList [10,20,30]

m' <- S.unsafeThaw m x' <- S.unsafeThaw x y' <- SM.new 2

let mat = MutableDenseMatrix SRow 3 2 3 m' xvec = MutableDenseVector SDirect 3 1 x' yvec = MutableDenseVector SDirect 2 1 y'

dgemv NoTranspose 1 0 mat xvec yvec

y <- S.unsafeFreeze y' print y

ghci> :main [50.0,140.0]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/wellposed/hblas/issues/39, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwg5INiH_6AwBKO0a3S5Aox50rrRFks5sAjWLgaJpZM4NvRgE .

cartazio commented 7 years ago

ok, the handling of what coordinates to swap when doing transpose vs notranspose were wrong, but i'm not sure if that was related to your bug in this case :(

cartazio commented 7 years ago

@tmcdonell as a near term workaround *gemm routines should be correct, and i think most blas implementations generally provide optimizations to support gemv style operations via gemm :)

tmcdonell commented 7 years ago

By the way, no rush on my end; was playing around with this as a bit of a side project, but now it is Monday morning and I am back in the office ):

cartazio commented 7 years ago

ok, phewf :)

if you find the time or inclination to play with that gemvAbstraction code and or have ideas on the bug please share :)

cartazio commented 7 years ago

@tmcdonell for now i've killed the 0.4 release on hackage, cause the release definitely has some bugs. :(

axman6 commented 6 years ago

So it looks like the fix for this is simple, the coordinates for ax and ay need to be swapped at https://github.com/wellposed/hblas/blob/master/src/Numerical/HBLAS/BLAS/Internal/Level2.hs#L170

              (newx,newy) = coordSwapper tr (ax,ay)
=> 
              (newx,newy) = coordSwapper tr (ay,ax)

I would say that x and y aren't great names to use here, since they don't reflect the language used in the CBLAS documentation (for example, Intel's docs: https://software.intel.com/en-us/mkl-developer-reference-fortran-gemv talk about m and n representing rows and columns).

Running @tmcdonell's program gives:

*Main> :main
[140.0,320.0]
cartazio commented 6 years ago

i'm gonna have a go at changing my test case example, i guess that part is wrong :)

cartazio commented 6 years ago

this is fixed in master