scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
829 stars 85 forks source link

LayoutBuilder in Numba is slower than ArrayBuilder in Numba #2599

Open ianna opened 1 year ago

ianna commented 1 year ago

Version of Awkward Array

2.3.1

Description and code to reproduce

@jpivarski - as discussed, I'm looking into the issue. Indeed, there is nearly 7x difference between an ArrayBuilder in Numba and a LayoutBuilder in Numba (the tests run twice to account for a "warm up"):

import awkward as ak
import numba
import numpy as np

import awkward._connect.numba.arrayview
import awkward.numba.layoutbuilder as lb

ak.numba.register_and_check()

import time

MULTIPLIER = int(10e6)
print("MULTIPLIER", MULTIPLIER)

def test_Numpy_LayoutBuilder():
    @numba.njit
    def f3(x):
        for i in range(MULTIPLIER):
            x.append(1.1)
            x.append(2.2)
            x.append(3.3)
            x.append(4.4)
            x.append(5.5)

        return x

    l = lb.Numpy(np.float64)
    b = f3(l)

def test_Numpy_ArrayBuilder():
    @numba.njit
    def f4(x):
        for i in range(MULTIPLIER):
            x.real(1.1)
            x.real(2.2)
            x.real(3.3)
            x.real(4.4)
            x.real(5.5)

        return x

    a = ak.highlevel.ArrayBuilder()
    b = f4(a)

for function in test_Numpy_LayoutBuilder, test_Numpy_ArrayBuilder:
    t1 = time.perf_counter(), time.process_time()
    function()
    t2 = time.perf_counter(), time.process_time()
    print(f"{function.__name__}()")
    print(f" Real time: {t2[0] - t1[0]:.2f} seconds")
    print(f" CPU time: {t2[1] - t1[1]:.2f} seconds")
    print()

    t1 = time.perf_counter(), time.process_time()
    function()
    t2 = time.perf_counter(), time.process_time()
    print(f"{function.__name__}()")
    print(f" Real time: {t2[0] - t1[0]:.2f} seconds")
    print(f" CPU time: {t2[1] - t1[1]:.2f} seconds")
    print()

to build an array of five elements:

test_Numpy_LayoutBuilder()
 Real time: 1.62 seconds
 CPU time: 1.49 seconds

test_Numpy_LayoutBuilder()
 Real time: 0.11 seconds
 CPU time: 0.11 seconds

test_Numpy_ArrayBuilder()
 Real time: 0.04 seconds
 CPU time: 0.04 seconds

test_Numpy_ArrayBuilder()
 Real time: 0.04 seconds
 CPU time: 0.04 seconds

to build an array of 5x10e6 elements:

test_Numpy_LayoutBuilder()
 Real time: 4.87 seconds
 CPU time: 4.85 seconds

test_Numpy_LayoutBuilder()
 Real time: 3.79 seconds
 CPU time: 3.77 seconds

test_Numpy_ArrayBuilder()
 Real time: 0.56 seconds
 CPU time: 0.56 seconds

test_Numpy_ArrayBuilder()
 Real time: 0.57 seconds
 CPU time: 0.57 seconds
ianna commented 1 year ago

It looks like using a numba.typed.List could improve a LayoutBuilder performance:

MULTIPLIER 10000000
test_Numpy_LayoutBuilder()
 Real time: 5.15 seconds
 CPU time: 5.14 seconds

test_Numpy_LayoutBuilder()
 Real time: 4.02 seconds
 CPU time: 4.01 seconds

test_Numpy_ArrayBuilder()
 Real time: 0.58 seconds
 CPU time: 0.58 seconds

test_Numpy_ArrayBuilder()
 Real time: 0.58 seconds
 CPU time: 0.58 seconds

test_Numpy_TypedList()
 Real time: 1.03 seconds
 CPU time: 1.03 seconds

test_Numpy_TypedList()
 Real time: 0.68 seconds
 CPU time: 0.67 seconds
ianna commented 1 year ago

A benchmark from HDembinski. See the notebook

In [1]: import numba as nb
   ...: import numpy as np
   ...: import awkward as ak
   ...: print(f"{nb.__version__=}")
   ...: print(f"{ak.__version__=}")
   ...: 
nb.__version__='0.58.0rc1'
ak.__version__='2.3.3'
Screenshot 2023-08-28 at 20 50 34