haskell-numerics / hmatrix

Linear algebra and numerical computation
379 stars 105 forks source link

DGEMM/DSYEV failures with openblas #211

Open nh2 opened 7 years ago

nh2 commented 7 years ago

I'm trying to use hmatrix with -fopenblas but get the following errors:

** On entry to DGEMM  parameter number 13 had an illegal value
** On entry to DSYEV Safe minimumPrecisionM parameter number  3 had an illegal value
** On entry to DSYEV Safe minimumPrecisionM parameter number  8 had an illegal value
myprogram: eigS': code -8
CallStack (from HasCallStack):
 error, called at src/Internal/Devel.hs:51:21 in hmatrix-0.18.0.0-75rsOPeDsPlHSVa1abBn2n:Internal.Devel

Note the use seems to be in eigS'

Is this known issue / can you reproduce it with some of your uses of hmatrix, or should I dig deeper?

albertoruiz commented 7 years ago

Do you have this error without -fopenblas, e.g. using liblapack-base-dev?

nh2 commented 7 years ago

No, this error only happens with openblas enabled.

albertoruiz commented 7 years ago

In that case, it seems that it openblas is not a truly compatible replacement of standard blas. Exactly the same compiled code should be able to link with blas or openblas without the need of changes in the source code.

nh2 commented 7 years ago

That's possible. My first suspicion is 32 vs 64 bit, but I haven't looked in detail yet.

nh2 commented 7 years ago

I've looked a bit into this, and indeed it seems to be 32 vs 64 bit.

If in Nix I use (openblas.override { blas64 = false; }) instead of openblas (which translates to compiling openblas with INTERFACE64=0, then hmatrix seems to work all fine.

The comment about this is:

# Most packages depending on openblas expect integer width to match pointer width,
# but some expect to use 32-bit integers always (for compatibility with reference BLAS).
albertoruiz commented 7 years ago

Thanks! We should include this info somewhere in the documentation.

jamesthompson commented 6 years ago

@nh2 Do you have a nix derivation for hmatrix working? At runtime on OSX I keep getting these same issues with the latest hmatrix.

nh2 commented 6 years ago

@jamesthompson I use in my packageOverrides:

        openblas = (pkgs.callPackage ./openblas.nix {
          inherit cpuMarch;
          # See <https://github.com/albertoruiz/hmatrix/issues/211>
          blas64 = false;
        });

and then hmatrix picks up this 32-bit openblas.

In case it is relevant for you, my openblas.nix is:

{ stdenv, fetchurl, gfortran, perl, which, config, coreutils
# Most packages depending on openblas expect integer width to match
# pointer width, but some expect to use 32-bit integers always
# (for compatibility with reference BLAS).
, blas64 ? null
, cpuMarch
}:

with stdenv.lib;

let blas64_ = blas64; in

let
  # See https://github.com/xianyi/OpenBLAS/blob/develop/TargetList.txt
  # and https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/x86-Options.html#x86-Options
  cpuMarchTargets = {
    nehalem = "NEHALEM";
    westmere = "NEHALEM";
    sandybridge = "SANDYBRIDGE";
    ivybridge = "SANDYBRIDGE";
    haswell = "HASWELL";
    broadwell = "HASWELL";
    skylake = "HASWELL";
    knl = "HASWELL";
    skylake-avx512 = "HASWELL";
  };

  config = if cpuMarch == "x86-64"
    then {
      BINARY = "64";
      TARGET = "ATHLON";
      DYNAMIC_ARCH = "1";
      CC = "gcc";
      USE_OPENMP = "1";
    }
    else {
      BINARY = "64";
      TARGET = cpuMarchTargets.${cpuMarch} or (throw "unsupported march: ${cpuMarch}");
      DYNAMIC_ARCH = "0";
      CC = "gcc";
      USE_OPENMP = "1";
      # Setting HOSTCC and CROSS enables cross compilation, which prevents
      # running tests that we might not be able to execute due to missing
      # CPU instructions.
      HOSTCC = "gcc";
      CROSS = "1";
    };
in

let
  blas64 =
    if blas64_ != null
      then blas64_
      else hasPrefix "x86_64" stdenv.system;

  version = "0.2.19";
in
stdenv.mkDerivation {
  name = "openblas-${version}";
  src = fetchurl {
    url = "https://github.com/xianyi/OpenBLAS/archive/v${version}.tar.gz";
    sha256 = "0mw5ra1vjsqiba79zdhqfkqq6v3bla5a5c0wj7vca9qgjzjbah4w";
    name = "openblas-${version}.tar.gz";
  };

  inherit blas64;

  # Some hardening features are disabled due to sporadic failures in
  # OpenBLAS-based programs. The problem may not be with OpenBLAS itself, but
  # with how these flags interact with hardening measures used downstream.
  # In either case, OpenBLAS must only be used by trusted code--it is
  # inherently unsuitable for security-conscious applications--so there should
  # be no objection to disabling these hardening measures.
  hardeningDisable = [
    # don't modify or move the stack
    "stackprotector" "pic"
    # don't alter index arithmetic
    "strictoverflow"
    # don't interfere with dynamic target detection
    "relro" "bindnow"
  ];

  nativeBuildInputs =
    [gfortran perl which]
    ++ optionals stdenv.isDarwin [coreutils];

  makeFlags =
    [
      "FC=gfortran"
      ''PREFIX="''$(out)"''
      "NUM_THREADS=64"
      "INTERFACE64=${if blas64 then "1" else "0"}"
    ]
    ++ mapAttrsToList (var: val: var + "=" + val) config;

  doCheck = true;
  checkTarget = "tests";

  meta = with stdenv.lib; {
    description = "Basic Linear Algebra Subprograms";
    license = licenses.bsd3;
    homepage = "https://github.com/xianyi/OpenBLAS";
    platforms = platforms.unix;
    maintainers = with maintainers; [ ttuegel ];
  };
}
jamesthompson commented 6 years ago

@nh2 Thanks a lot for this. Sadly, I'm still seeing the same issues even after building with this openblas derivation.

I wonder where do you get the cpuMarch value from?

jamesthompson commented 6 years ago

For reference this is my hmatrix derivation:

{ mkDerivation, array, base, binary, blas, bytestring, deepseq
, fetchgit, liblapack, random, semigroups, split, stdenv
, storable-complex, vector, darwin, openblas
}:
mkDerivation {
  pname = "hmatrix";
  version = "0.18.2.0";
  src = fetchgit {
    url = "https://github.com/albertoruiz/hmatrix";
    sha256 = "11wr59wg21rky59j3kkd3ba6aqns9gkh0r1fnhwhn3fp7zfhanqn";
    rev = "d83b17190029c11e3ab8b504e5cdc917f5863120";
  };
  postUnpack = ''
    sourceRoot+=/packages/base;
    echo source root reset to $sourceRoot
  '';
  buildDepends = [ (stdenv.lib.optionals stdenv.isDarwin darwin.apple_sdk.frameworks.Accelerate) ];
  configureFlags = [
    "-fdisable-default-paths"
    "-fopenblas"
  ];
  libraryHaskellDepends = [
    array base binary bytestring deepseq random semigroups split
    storable-complex vector
  ];
  librarySystemDepends = [ openblas liblapack ];
  homepage = "https://github.com/albertoruiz/hmatrix";
  description = "Numeric Linear Algebra";
  license = stdenv.lib.licenses.bsd3;
}

and I'm calling it in my release.nix haskell overrides thus:

hmatrix = pkgs.haskell.lib.enableCabalFlag (pkgs.haskell.lib.enableCabalFlag (haskellPackagesNew.callPackage ./nix/hmatrix.nix { }) "openblas") "disable-default-paths";
idontgetoutmuch commented 6 years ago

Can you post an example bit of Haskell which gives the error? Can you also post which nixpkgs you are using? Maybe you have a nixpkgs.nix like this

import (fetchTarball "https://github.com/nixos/nixpkgs/archive/0d7a0d7572d35526ddf34b6d011b7b88a8904b36.tar.gz")

With this I can e.g. create a shell.nix like this to ensure I am the same as my co-workers

{ nixpkgs ? import ./nixpkgs.nix {
    # overlays = [
    #   (self: super: {gsl = super.gsl.overrideAttrs (o: {CFLAGS = "-DDEBUG";});})
    # ];
  }
, ghc ? nixpkgs.haskell.compiler.ghc822 }:
nh2 commented 6 years ago

I wonder where do you get the cpuMarch value from?

@jamesthompson Ah, I just pass it in. I have it as an optimisation when I know my machine has a newer architecture, e.g. "haswell". For default Nix purposes, you can just assume or set it to be "x86-64".

nh2 commented 6 years ago

Can you post an example bit of Haskell which gives the error? Can you also post which nixpkgs you are using?

@jamesthompson Yes, I think the best way forward is if you could make a little standalone github repo that has a minimal example in it and shows the error, then I can patch it up like I have it and see if it fixes it.

jamesthompson commented 6 years ago
    I hope to get round to it this weekend. I’m quite busy but I will do my best. Thanks a lot for your help!

On Sat, Mar 24, 2018 at 8:09 AM -0400, "Niklas Hambüchen" notifications@github.com wrote:

Can you post an example bit of Haskell which gives the error? Can you also post which nixpkgs you are using?

@jamesthompson Yes, I think the best way forward is if you could make a little standalone github repo that has a minimal example in it and shows the error, then I can patch it up like I have it and see if it fixes it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

rikvdkleij commented 5 years ago

I am experiencing the same issue when I run a program which depends on hmatrix/openblas in a Docker container. It works fine when I run the program on OSX (build by Nix (via Stack integration)). But when I build the program in Docker (in the same way via Nix/Stack) and run it, I get the same error message.

Any ideas?

idontgetoutmuch commented 5 years ago

@rikvdkleij I can try and reproduce if you like but you will have to give me a test program and tell me how to build it in Docker.

rikvdkleij commented 5 years ago

Sorry, it is solved the meantime by using (openblas.override { blas64 = false; } for the build in the Docker container.