rasbt / machine-learning-notes

Collection of useful machine learning codes and snippets (originally intended for my personal use)
BSD 3-Clause "New" or "Revised" License
774 stars 138 forks source link

Incorrect benchmark of numpy and arrow backends #37

Open tdpetrou opened 1 year ago

tdpetrou commented 1 year ago

There are a couple issues in this notebook that you can change to provide a better comparison between numpy and arrow. Most importantly, you need to make the numpy array a fortran array with:

np.asfortranarray(numbers)

Next, when summing with numbers.sum(), you are summing over both axes. It sums every value in each axis producing a single result. You need to do comparisons across each axis numbers.sum(axis=0) and numbers.sum(axis=1). You will see that arrow is 1000x slower when summing across the horizontal axis.