xtensor-stack / xtensor

C++ tensors with broadcasting and lazy computing
BSD 3-Clause "New" or "Revised" License
3.32k stars 396 forks source link

transpose slower than numpy #2784

Open wlz987 opened 4 months ago

wlz987 commented 4 months ago

xt::xtensor<float, 3> img_array_trans = xt::cast\<float>(xt::transpose(img_array, {2, 0, 1})) / 255.0f;

slower than

n_arrary = np.transpose(n_arrary.astype( np.float32) / 255.0, (2, 0, 1)) # 浅拷贝 n_arrary = np.ascontiguousarray(n_arrary) # 深拷贝

CC = g++ CFLAGS = -Wall -std=c++14 -O3 -march=native -mavx2 -ffast-math -funroll-loops -pthread -flto -malign-double

XTENSOR_FLAGS = -DXTENSOR_USE_XSIMD -DNDEBUG

XTL_PATH = /home/lhx/software/raw_code/xtl XTENSOR_PATH = /home/lhx/software/raw_code/xtensor XSIMD_PATH = /home/lhx/software/raw_code/xsimd