xtensor-stack / xtensor-python

Python bindings for xtensor
BSD 3-Clause "New" or "Revised" License
345 stars 58 forks source link

Performance of xt::nanmean #201

Open zhujun98 opened 4 years ago

zhujun98 commented 4 years ago

In Python I have a 3D numpy array which is a stack of image data and I would like to calculate nanmean. I tried two different ways using xtensor-python:

template<typename T>
inline xt::pytensor<T, 2> nanmeanImages(const xt::pytensor<T, 3>& arr) {

  auto shape = arr.shape();
  auto mean = xt::pytensor<T, 2>({shape[1], shape[2]});

  for (std::size_t j=0; j < shape[1]; ++j) {
    for (std::size_t k=0; k < shape[2]; ++k) {
      T count = 0;
      T sum = 0;
      for (std::size_t i=0; i < shape[0]; ++i) {
        auto v = arr(i, j, k);
        if (! std::isnan(v)) {
          count += T(1);
          sum += v;
        }
      }
      mean(j, k) = sum / count;
    }
  }

  return mean;
}

template<typename T>
inline xt::pytensor<T, 2> nanmeanImagesOld(const xt::pytensor<T, 3>& arr) {
  return xt::nanmean<T>(arr, {0}, xt::evaluation_strategy::immediate);
}

and benchmarked with the following Python code:

        data = np.ones((64, 1024, 1024), dtype=np.float32)
        data[::2, ::2, ::2] = np.nan

        t0 = time.perf_counter()
        ret_cpp = xt_nanmean_images(data)
        dt_cpp = time.perf_counter() - t0

        t0 = time.perf_counter()
        ret_cpp = xt_nanmean_images_old(data)
        dt_cpp_old = time.perf_counter() - t0

The result is

nanmean_images with <class 'numpy.float32'> - dt (cpp): 0.1258, dt (cpp) old: 0.1573

I guess the first one is faster because xt::nanmean uses xt::nansum, xt::count_nonnan which needs to loop over the big array twice. Also xt::count_nonnan is twice as expensive as xt::nansum for whatever reason. I compile xtensor with xsimd and do not see any improvement. But I am quite new to xsimd and not sure whether I did everything correctly.

I would like to further improve the performance by using tbb. I am not sure whether it is the best way to go and would like to ask your opinion. Thanks a lot!

wolfv commented 4 years ago

Hi @zhujun98! Thanks for the bug report.

Indeed there is a problem with the performance of mean.

We have a fix in this PR: https://github.com/QuantStack/xtensor/pull/1627

I think I will take out the fix for mean by the beginning of next week so that we have that ready to go since the PR is blocked on some TBB issues that were not that straightforward to fix unfortunately.

Cheers!