jupyter-xeus / xeus-cling

Jupyter kernel for the C++ programming language
BSD 3-Clause "New" or "Revised" License
3.02k stars 292 forks source link

Slow efficiency when compared to compiled clang++/g++ program #427

Open ajz34 opened 2 years ago

ajz34 commented 2 years ago

Hi devs!

It's pretty awesome to have c++ code running in jupyter ๐Ÿ˜„ And that could be integrated to blogs using sphinx. But when I'm writing something related to program efficiency, it appears that xeus-cling kernel seems very slow.

Take example of 1024*1024 matrix multiplication, when compiled by clang++, running the program costs 170 ms; but for xeus-cling jupyter kernel, it costs 3000 ms. It appears that the more levels of loops, the worse efficiency.

I wonder if xeus-cling is not intended and not recommended to demonstrate code efficiency. Or maybe I miss anything ๐Ÿ˜ฟ Thanks in advance.


Compile by clang++:

$ clang++ matmul.cpp -march=native -O3 -std=c++11

xeus-cling kernel (configuration file located at ~/.local/share/jupyter/kernels/xcpp14/kernel.json)

{
  "display_name": "C++14",
  "argv": [
      "<somepath>/miniconda3/bin/xcpp",
      "-f",
      "{connection_file}",
      "-O3",
      "-march=native",
      "-fopenmp",
      "-std=c++14"
  ],
  "language": "C++14"
}

Matrix program

#include <chrono>
#include <iostream>

using namespace std;

int main() {
    float * A, * B, * C;
    A = (float *) aligned_alloc(64, 1024*1024 * sizeof(float));
    B = (float *) aligned_alloc(64, 1024*1024 * sizeof(float));
    C = (float *) aligned_alloc(64, 1024*1024 * sizeof(float));
    for (int p = 0; p < 1024*1024; ++p) A[p] = p % 10;
    for (int p = 0; p < 1024*1024; ++p) B[p] = p % 21;
    for (int p = 0; p < 1024*1024; ++p) C[p] = 0;

    auto start = chrono::high_resolution_clock::now(); // tic
    for (int i = 0; i < 1024; ++i)
    for (int k = 0; k < 1024; ++k)
    for (int j = 0; j < 1024; ++j)
        C[i*1024 + j] += A[i*1024 + k] * B[k*1024 + j];
    auto stop = chrono::high_resolution_clock::now(); // toc
    chrono::duration<double, std::milli> dur = stop - start;
    cout << dur.count() << "\n";
}