mratsim / Arraymancer

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
https://mratsim.github.io/Arraymancer/
Apache License 2.0
1.32k stars 95 forks source link

Archlinux specific - float64 matmul tests failing #375

Open mratsim opened 4 years ago

mratsim commented 4 years ago

Edited: OpenMP -> Archlinux, I never run the full test suite on Arch due to the impossibility to do "nimble test -d:blas=cblas" as contrary to Debian/Ubuntu and Travis, the BLAS symbol are in libcblas.so and not libblas.so.

A couple of tests are failing

[Suite] BLAS (Basic Linear Algebra Subprograms)
    /home/beta/Programming/Nim/Arraymancer/tests/tensor/test_operators_blas.nim(107, 47): Check failed: n1.astype(float) * n2.astype(float) == n1n2.astype(float)
    n1.astype(float) * n2.astype(float) was Tensor[system.float64] of shape [8, 8] of type "float64" on backend "Cpu"
|17.0   15.0    35.0    24.0    50.0    39.0    43.0    33.0|
|17.0   13.0    31.0    13.0    36.0    33.0    33.0    22.0|
|19.0   17.0    29.0    24.0    26.0    28.0    26.0    32.0|
|12.0   10.0    16.0    13.0    36.0    28.0    30.0    21.0|
|29.0   27.0    52.0    32.0    52.0    50.0    51.0    42.0|
|28.0   26.0    56.0    31.0    50.0    50.0    49.0    42.0|
|21.0   21.0    30.0    28.0    27.0    30.0    25.0    36.0|
|16.0   14.0    26.0    28.0    39.0    31.0    24.0    34.0|

    n1n2.astype(float) was Tensor[system.float64] of shape [8, 8] of type "float64" on backend "Cpu"
|27.0   23.0    16.0    29.0    35.0    32.0    58.0    37.0|
|24.0   19.0    11.0    23.0    26.0    30.0    49.0    27.0|
|34.0   29.0    21.0    21.0    34.0    34.0    36.0    32.0|
|17.0   22.0    15.0    21.0    28.0    25.0    40.0    33.0|
|39.0   27.0    23.0    40.0    45.0    46.0    72.0    41.0|
|41.0   26.0    25.0    34.0    47.0    48.0    65.0    38.0|
|33.0   28.0    22.0    26.0    37.0    34.0    41.0    33.0|
|14.0   12.0    9.0     22.0    27.0    17.0    51.0    23.0|

  [FAILED] GEMM - General Matrix to Matrix Multiplication
[Suite] Convolution 2D
  [OK] Simple Conv2D [Im2ColGEMM]
  [OK] Strided Conv2D [Im2ColGEMM]
    /home/beta/Programming/Nim/Arraymancer/tests/nn_primitives/test_nnp_convolution.nim(167, 80): Check failed: mean_relative_error(target_grad_weight, grad_weight.astype(float)) < 1e-06
    mean_relative_error(target_grad_weight, grad_weight.astype(float)) was 0.171687640072218
    /home/beta/Programming/Nim/Arraymancer/tests/nn_primitives/test_nnp_convolution.nim(168, 78): Check failed: mean_relative_error(target_grad_input, grad_input.astype(float)) < 1e-06
    mean_relative_error(target_grad_input, grad_input.astype(float)) was 0.09811630960365464
  [FAILED] Conv2D Forward + Backward [Im2ColGEMM]
[Suite] Autograd of shapeshifting operations
  [OK] Gradient of stack operation
  [OK] Gradient of chunk operation
  [OK] Gradient of uneven chunks + slicing operations
    /home/beta/Programming/Nim/Arraymancer/tests/autograd/test_gate_shapeshifting.nim(185, 47): Check failed: mean_relative_error(vx.grad, expected_x) < 1e-07
    mean_relative_error(vx.grad, expected_x) was 0.04603680328842539
    /home/beta/Programming/Nim/Arraymancer/tests/autograd/test_gate_shapeshifting.nim(186, 47): Check failed: mean_relative_error(vy.grad, expected_y) < 1e-07
    mean_relative_error(vy.grad, expected_y) was 0.04603680328842539
    /home/beta/Programming/Nim/Arraymancer/tests/autograd/test_gate_shapeshifting.nim(187, 47): Check failed: mean_relative_error(vz.grad, expected_z) < 1e-07
    mean_relative_error(vz.grad, expected_z) was 0.1068985684420457
  [FAILED] Gradient of squeeze operation (+ chunking)
  [OK] Gradient of unsqueeze operation
mratsim commented 4 years ago

Minimal example for reproduction

import ../src/arraymancer

let n1 = [[2, 4,  3,  1,  3,  1,  3,  1],
          [1, 2,  1,  1,  2,  0,  4,  3],
          [2, 0,  0,  3,  0,  4,  4,  1],
          [1, 1,  4,  0,  3,  1,  3,  0],
          [3, 4,  1,  1,  4,  2,  3,  4],
          [2, 4,  0,  2,  3,  3,  3,  4],
          [3, 0,  0,  3,  1,  4,  3,  1],
          [4, 3,  2,  4,  1,  0,  0,  0]].toTensor()

let n2 = [[2, 2,  0,  4,  0,  0,  4,  2],
          [2, 0,  0,  1,  1,  1,  3,  1],
          [0, 2,  2,  0,  2,  2,  3,  3],
          [0, 0,  1,  0,  4,  2,  4,  1],
          [0, 0,  1,  3,  4,  2,  4,  2],
          [4, 3,  4,  1,  4,  4,  0,  3],
          [3, 3,  0,  2,  1,  2,  3,  3],
          [2, 1,  2,  1,  2,  4,  4,  1]].toTensor()

let n1n2 = [[27,23,16,29,35,32,58,37],
            [24,19,11,23,26,30,49,27],
            [34,29,21,21,34,34,36,32],
            [17,22,15,21,28,25,40,33],
            [39,27,23,40,45,46,72,41],
            [41,26,25,34,47,48,65,38],
            [33,28,22,26,37,34,41,33],
            [14,12, 9,22,27,17,51,23]].toTensor()

let
  fn1 = n1.astype(float)
  fn2 = n2.astype(float)

echo fn1
echo fn2

doAssert fn1 * fn2 == n1n2.astype(float)

Compile with

nim c -r -d:blas=cblas build/f64_gemm.nim

Using MKL instead works and also solved all the reported failures. The commandline is a bit tricky

nim c -r -d:blas=mkl_intel_lp64 --clibdir:"/opt/intel/mkl/lib/intel64" --dynlibOverride:"mkl_intel_lp64" --passl:"/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a -lmkl_core -lmkl_gnu_thread -lgomp"  build/f64_gemm.nim

And for the tests

nim c -r -d:blas=mkl_intel_lp64 --clibdir:"/opt/intel/mkl/lib/intel64" --dynlibOverride:"mkl_intel_lp64" --passl:"/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a -lmkl_core -lmkl_gnu_thread -lgomp"  tests/nn_primitives/test_nnp_convolution.nim
nim c -r -d:blas=mkl_intel_lp64 --clibdir:"/opt/intel/mkl/lib/intel64" --dynlibOverride:"mkl_intel_lp64" --passl:"/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.a -lmkl_core -lmkl_gnu_thread -lgomp"  tests/autograd/test_gate_shapeshifting.nim

The shapeshifting test seems to be unrelated but it actually requires a matrix multiplication for verification: https://github.com/mratsim/Arraymancer/blob/bde79d2f73b71ece719526a7b39f03bb100784b0/tests/autograd/test_gate_shapeshifting.nim#L145-L162


float32 works as well

mratsim commented 4 years ago

Further investigation show that:

So Arch packaging by separating blas and cblas and using openblas with netlib cblas header instead of openblas cblas header caused an issue.

mratsim commented 4 years ago

Numpy is also impacted if build with OpenBLAS

import numpy as np

n1 = np.array(
      [[2, 4,  3,  1,  3,  1,  3,  1],
       [1, 2,  1,  1,  2,  0,  4,  3],
       [2, 0,  0,  3,  0,  4,  4,  1],
       [1, 1,  4,  0,  3,  1,  3,  0],
       [3, 4,  1,  1,  4,  2,  3,  4],
       [2, 4,  0,  2,  3,  3,  3,  4],
       [3, 0,  0,  3,  1,  4,  3,  1],
       [4, 3,  2,  4,  1,  0,  0,  0]],
      dtype=np.float64)

n2 = np.array(
      [[2, 2,  0,  4,  0,  0,  4,  2],
       [2, 0,  0,  1,  1,  1,  3,  1],
       [0, 2,  2,  0,  2,  2,  3,  3],
       [0, 0,  1,  0,  4,  2,  4,  1],
       [0, 0,  1,  3,  4,  2,  4,  2],
       [4, 3,  4,  1,  4,  4,  0,  3],
       [3, 3,  0,  2,  1,  2,  3,  3],
       [2, 1,  2,  1,  2,  4,  4,  1]],
      dtype=np.float64)

n1n2 = np.array(
        [[27,23,16,29,35,32,58,37],
         [24,19,11,23,26,30,49,27],
         [34,29,21,21,34,34,36,32],
         [17,22,15,21,28,25,40,33],
         [39,27,23,40,45,46,72,41],
         [41,26,25,34,47,48,65,38],
         [33,28,22,26,37,34,41,33],
         [14,12, 9,22,27,17,51,23]],
      dtype=np.float64)

print(n1)
print(n2)

print(n1 @ n2)

np.testing.assert_array_equal(n1 @ n2, n1n2)
mratsim commented 4 years ago

Upstreamed: https://bugs.archlinux.org/task/63054

PKGBUILD fix: https://github.com/mratsim/Arch-Data-Science/commit/737d5de7f43220d3ce381b8ef480ef9372cf90fe