Archlinux specific - float64 matmul tests failing

mratsim commented 5 years ago

Edited: OpenMP -> Archlinux, I never run the full test suite on Arch due to the impossibility to do "nimble test -d:blas=cblas" as contrary to Debian/Ubuntu and Travis, the BLAS symbol are in libcblas.so and not libblas.so.

A couple of tests are failing

[Suite] BLAS (Basic Linear Algebra Subprograms)
    /home/beta/Programming/Nim/Arraymancer/tests/tensor/test_operators_blas.nim(107, 47): Check failed: n1.astype(float) * n2.astype(float) == n1n2.astype(float)
    n1.astype(float) * n2.astype(float) was Tensor[system.float64] of shape [8, 8] of type "float64" on backend "Cpu"
|17.0   15.0    35.0    24.0    50.0    39.0    43.0    33.0|
|17.0   13.0    31.0    13.0    36.0    33.0    33.0    22.0|
|19.0   17.0    29.0    24.0    26.0    28.0    26.0    32.0|
|12.0   10.0    16.0    13.0    36.0    28.0    30.0    21.0|
|29.0   27.0    52.0    32.0    52.0    50.0    51.0    42.0|
|28.0   26.0    56.0    31.0    50.0    50.0    49.0    42.0|
|21.0   21.0    30.0    28.0    27.0    30.0    25.0    36.0|
|16.0   14.0    26.0    28.0    39.0    31.0    24.0    34.0|

    n1n2.astype(float) was Tensor[system.float64] of shape [8, 8] of type "float64" on backend "Cpu"
|27.0   23.0    16.0    29.0    35.0    32.0    58.0    37.0|
|24.0   19.0    11.0    23.0    26.0    30.0    49.0    27.0|
|34.0   29.0    21.0    21.0    34.0    34.0    36.0    32.0|
|17.0   22.0    15.0    21.0    28.0    25.0    40.0    33.0|
|39.0   27.0    23.0    40.0    45.0    46.0    72.0    41.0|
|41.0   26.0    25.0    34.0    47.0    48.0    65.0    38.0|
|33.0   28.0    22.0    26.0    37.0    34.0    41.0    33.0|
|14.0   12.0    9.0     22.0    27.0    17.0    51.0    23.0|

  [FAILED] GEMM - General Matrix to Matrix Multiplication

[Suite] Convolution 2D
  [OK] Simple Conv2D [Im2ColGEMM]
  [OK] Strided Conv2D [Im2ColGEMM]
    /home/beta/Programming/Nim/Arraymancer/tests/nn_primitives/test_nnp_convolution.nim(167, 80): Check failed: mean_relative_error(target_grad_weight, grad_weight.astype(float)) < 1e-06
    mean_relative_error(target_grad_weight, grad_weight.astype(float)) was 0.171687640072218
    /home/beta/Programming/Nim/Arraymancer/tests/nn_primitives/test_nnp_convolution.nim(168, 78): Check failed: mean_relative_error(target_grad_input, grad_input.astype(float)) < 1e-06
    mean_relative_error(target_grad_input, grad_input.astype(float)) was 0.09811630960365464
  [FAILED] Conv2D Forward + Backward [Im2ColGEMM]

[Suite] Autograd of shapeshifting operations
  [OK] Gradient of stack operation
  [OK] Gradient of chunk operation
  [OK] Gradient of uneven chunks + slicing operations
    /home/beta/Programming/Nim/Arraymancer/tests/autograd/test_gate_shapeshifting.nim(185, 47): Check failed: mean_relative_error(vx.grad, expected_x) < 1e-07
    mean_relative_error(vx.grad, expected_x) was 0.04603680328842539
    /home/beta/Programming/Nim/Arraymancer/tests/autograd/test_gate_shapeshifting.nim(186, 47): Check failed: mean_relative_error(vy.grad, expected_y) < 1e-07
    mean_relative_error(vy.grad, expected_y) was 0.04603680328842539
    /home/beta/Programming/Nim/Arraymancer/tests/autograd/test_gate_shapeshifting.nim(187, 47): Check failed: mean_relative_error(vz.grad, expected_z) < 1e-07
    mean_relative_error(vz.grad, expected_z) was 0.1068985684420457
  [FAILED] Gradient of squeeze operation (+ chunking)
  [OK] Gradient of unsqueeze operation

mratsim commented 5 years ago

Minimal example for reproduction

import ../src/arraymancer

let n1 = [[2, 4,  3,  1,  3,  1,  3,  1],
          [1, 2,  1,  1,  2,  0,  4,  3],
          [2, 0,  0,  3,  0,  4,  4,  1],
          [1, 1,  4,  0,  3,  1,  3,  0],
          [3, 4,  1,  1,  4,  2,  3,  4],
          [2, 4,  0,  2,  3,  3,  3,  4],
          [3, 0,  0,  3,  1,  4,  3,  1],
          [4, 3,  2,  4,  1,  0,  0,  0]].toTensor()

let n2 = [[2, 2,  0,  4,  0,  0,  4,  2],
          [2, 0,  0,  1,  1,  1,  3,  1],
          [0, 2,  2,  0,  2,  2,  3,  3],
          [0, 0,  1,  0,  4,  2,  4,  1],
          [0, 0,  1,  3,  4,  2,  4,  2],
          [4, 3,  4,  1,  4,  4,  0,  3],
          [3, 3,  0,  2,  1,  2,  3,  3],
          [2, 1,  2,  1,  2,  4,  4,  1]].toTensor()

let n1n2 = [[27,23,16,29,35,32,58,37],
            [24,19,11,23,26,30,49,27],
            [34,29,21,21,34,34,36,32],
            [17,22,15,21,28,25,40,33],
            [39,27,23,40,45,46,72,41],
            [41,26,25,34,47,48,65,38],
            [33,28,22,26,37,34,41,33],
            [14,12, 9,22,27,17,51,23]].toTensor()

let
  fn1 = n1.astype(float)
  fn2 = n2.astype(float)

echo fn1
echo fn2

doAssert fn1 * fn2 == n1n2.astype(float)

Compile with

nim c -r -d:blas=cblas build/f64_gemm.nim

Using MKL instead works and also solved all the reported failures. The commandline is a bit tricky

nim c -r -d:blas=mkl_intel_lp64 --clibdir:"/opt/intel/mkl/lib/intel64" --dynlibOverride:"mkl_intel_lp64" --passl:"/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a -lmkl_core -lmkl_gnu_thread -lgomp"  build/f64_gemm.nim

And for the tests

nim c -r -d:blas=mkl_intel_lp64 --clibdir:"/opt/intel/mkl/lib/intel64" --dynlibOverride:"mkl_intel_lp64" --passl:"/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a -lmkl_core -lmkl_gnu_thread -lgomp"  tests/nn_primitives/test_nnp_convolution.nim
nim c -r -d:blas=mkl_intel_lp64 --clibdir:"/opt/intel/mkl/lib/intel64" --dynlibOverride:"mkl_intel_lp64" --passl:"/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a -lmkl_core -lmkl_gnu_thread -lgomp"  tests/autograd/test_gate_shapeshifting.nim

The shapeshifting test seems to be unrelated but it actually requires a matrix multiplication for verification: https://github.com/mratsim/Arraymancer/blob/bde79d2f73b71ece719526a7b39f03bb100784b0/tests/autograd/test_gate_shapeshifting.nim#L145-L162

float32 works as well

mratsim commented 5 years ago

Further investigation show that:

I was using openblas + cblas Arch package
Switching to blas + cblas solves the issue on Arch.

So Arch packaging by separating blas and cblas and using openblas with netlib cblas header instead of openblas cblas header caused an issue.

mratsim commented 5 years ago

Numpy is also impacted if build with OpenBLAS

import numpy as np

n1 = np.array(
      [[2, 4,  3,  1,  3,  1,  3,  1],
       [1, 2,  1,  1,  2,  0,  4,  3],
       [2, 0,  0,  3,  0,  4,  4,  1],
       [1, 1,  4,  0,  3,  1,  3,  0],
       [3, 4,  1,  1,  4,  2,  3,  4],
       [2, 4,  0,  2,  3,  3,  3,  4],
       [3, 0,  0,  3,  1,  4,  3,  1],
       [4, 3,  2,  4,  1,  0,  0,  0]],
      dtype=np.float64)

n2 = np.array(
      [[2, 2,  0,  4,  0,  0,  4,  2],
       [2, 0,  0,  1,  1,  1,  3,  1],
       [0, 2,  2,  0,  2,  2,  3,  3],
       [0, 0,  1,  0,  4,  2,  4,  1],
       [0, 0,  1,  3,  4,  2,  4,  2],
       [4, 3,  4,  1,  4,  4,  0,  3],
       [3, 3,  0,  2,  1,  2,  3,  3],
       [2, 1,  2,  1,  2,  4,  4,  1]],
      dtype=np.float64)

n1n2 = np.array(
        [[27,23,16,29,35,32,58,37],
         [24,19,11,23,26,30,49,27],
         [34,29,21,21,34,34,36,32],
         [17,22,15,21,28,25,40,33],
         [39,27,23,40,45,46,72,41],
         [41,26,25,34,47,48,65,38],
         [33,28,22,26,37,34,41,33],
         [14,12, 9,22,27,17,51,23]],
      dtype=np.float64)

print(n1)
print(n2)

print(n1 @ n2)

np.testing.assert_array_equal(n1 @ n2, n1n2)

mratsim commented 5 years ago

Upstreamed: https://bugs.archlinux.org/task/63054

PKGBUILD fix: https://github.com/mratsim/Arch-Data-Science/commit/737d5de7f43220d3ce381b8ef480ef9372cf90fe

mratsim / Arraymancer

Archlinux specific - float64 matmul tests failing #375