zeam-vm / pelemay

Pelemay is a native compiler for Elixir, which generates SIMD instructions. It has a plan to generate for GPU code.
Apache License 2.0
186 stars 14 forks source link

Pelemay slows down in case of mix bench on pelemay_sample #134

Closed zacky1972 closed 4 years ago

zacky1972 commented 4 years ago

Describe the bug Pelemay slows down in case of mix bench on pelemay_sample

To Reproduce Steps to reproduce the behavior:

  1. Use Pelemay in the source code from pelemay_sample
  2. Run in the command 'mix bench'
  3. Pelemay on LogisticMapBench slows down

Expected behavior Pelemay on LogisticMapBench will be faster than Enum and Flow.

Screenshots None

Desktop (please complete the following information):

For later than Pelemay version 0.0.7,

> CpuInfo.all_profile
%{
  compiler: %{
    apple_clang: [
      %{
        bin: "/usr/bin/clang",
        type: :apple_clang,
        version: "11.0.0",
        versions: "Apple clang version 11.0.0 (clang-1100.0.33.17)"
      }
    ],
    "apple_clang++": [
      %{
        bin: "/usr/bin/clang++",
        type: :"apple_clang++",
        version: "11.0.0",
        versions: "Apple clang version 11.0.0 (clang-1100.0.33.17)"
      }
    ],
    cc_env: [],
    cflags_env: "-I/usr/local/opt/llvm/include -I/usr/local/opt/sqlite/include -I/usr/local/opt/mysql@5.6/include -I/usr/local/opt/openblas/include -I/usr/local/opt/openssl@1.1/include -I/usr/local/opt/icu4c/include -I/usr/local/opt/lapack/include",
    clang: [
      %{
        bin: "/usr/local/opt/llvm/bin/clang",
        type: :clang,
        version: "10.0.0",
        versions: "clang version 10.0.0 "
      }
    ],
    "clang++": [
      %{
        bin: "/usr/local/opt/llvm/bin/clang++",
        type: :"clang++",
        version: "10.0.0",
        versions: "clang version 10.0.0 "
      }
    ],
    cxx_env: [],
    cxxflags_env: "",
    "g++": [
      %{
        bin: "/usr/bin/g++",
        type: :"apple_clang++",
        version: "11.0.0",
        versions: "Apple clang version 11.0.0 (clang-1100.0.33.17)"
      },
      %{ 
        bin: "/usr/local/bin/g++-7",
        type: :"g++",
        version: "7.5.0",
        versions: "g++-7 (Homebrew GCC 7.5.0_2) 7.5.0"
      },
      %{
        bin: "/usr/local/bin/g++-8",
        type: :"g++",
        version: "8.4.0",
        versions: "g++-8 (Homebrew GCC 8.4.0_1) 8.4.0"
      },
      %{
        bin: "/usr/local/bin/g++-9",
        type: :"g++",
        version: "9.3.0",
        versions: "g++-9 (Homebrew GCC 9.3.0_1) 9.3.0"
      }
    ],
    gcc: [
      %{
        bin: "/usr/bin/gcc",
        type: :apple_clang,
        version: "11.0.0",
        versions: "Apple clang version 11.0.0 (clang-1100.0.33.17)"
      },
      %{
        bin: "/usr/local/bin/gcc-7",
        type: :gcc, 
        version: "7.5.0",
        versions: "gcc-7 (Homebrew GCC 7.5.0_2) 7.5.0"
      },
      %{
        bin: "/usr/local/bin/gcc-8",
        type: :gcc,
        version: "8.4.0",
        versions: "gcc-8 (Homebrew GCC 8.4.0_1) 8.4.0"
      },
      %{
        bin: "/usr/local/bin/gcc-9",
        type: :gcc,
        version: "9.3.0",
        versions: "gcc-9 (Homebrew GCC 9.3.0_1) 9.3.0"
      }
    ],
    ldflags_env: "-L/usr/local/opt/llvm/lib -L/usr/local/opt/sqlite/lib -L/usr/local/opt/mysql@5.6/lib -L/usr/local/opt/openblas/lib -L/usr/local/opt/openssl@1.1/lib -L/usr/local/opt/icu4c/lib -L/usr/local/opt/lapack/lib"
  },
  cpu: %{
    cpu_model: "18-Core Intel Xeon W",
    cpu_models: ["18-Core Intel Xeon W"],
    cpu_type: "x86_64",
    hyper_threading: :enabled,
    num_of_cores_of_a_processor: 18,
    num_of_processors: 1,
    num_of_threads_of_a_processor: 36,
    os_type: :macos,
    total_num_of_cores: 18,
    total_num_of_threads: 36
  },
  cuda: %{cuda: false},
  elixir: %{version: "1.10.0"},
  erlang: %{otp_version: 22},
  kernel: %{
    kernel_release: "19.4.0",
    kernel_version: "Darwin 19.4.0",
    system_version: "macOS 10.15.4 (19E287)"
  },
  metal: %{metal: true}
}

Additional context None.

zacky1972 commented 4 years ago

A benchmark using Pelemay in Benchfella should setup_all to call the list manipulation with Pelemay because it will initialize Pelemay, which consumes a lot of time.