pocketpy / gsoc-2024-dev

GSoC'24 develop repository for pybind11 and numpy
https://pocketpy.dev/gsoc/ideas/
6 stars 1 forks source link

Giving up on `np.matmul` will save about ~14Mb of disk space #46

Closed faze-geek closed 3 months ago

faze-geek commented 3 months ago

Since we want to make xtensor headers to be ideally < 5Mb to integrate with pocketpy main. I observed the following -

04:24:51  |base|anurag_b@nirpan.org@DQW-221-51 include ±|main ✗|→ du -sh xtensor
1.9M    xtensor
04:25:13  |base|anurag_b@nirpan.org@DQW-221-51 include ±|main ✗|→ du -sh xtensor-blas/
132K    xtensor-blas/
04:25:23  |base|anurag_b@nirpan.org@DQW-221-51 include ±|main ✗|→ du -sh xflens
13M xflens
04:25:32  |base|anurag_b@nirpan.org@DQW-221-51 include ±|main ✗|→ du -sh xtl
836K    xtl

You would notice that xtensor is not that heavy in terms of disk size. Infact xflens is contributing to about 90% of the size. We are using xtensor-blas at only one point in our code, to implement the matmul function here. https://github.com/pocketpy/gsoc-2024-dev/blob/d0e6874475315f9a71d669678b4b9b71f94c1cb7/numpy/include/numpy.hpp#L236-L237

This is because xtensor does not support matmul, it supports dot in xt::linalg which is dependent on xtensor-blas which inturn depends onxflens headers.

It is mostly correct but doesn't work like numpy matmul for the following cases -

  1. (a, b) x (b, c, d) -> (a, c, d)
  2. (a, b, c) x (a, c, d) -> (a, b, d)
faze-geek commented 3 months ago

So this leads to the question that should we give up np.matmul from our module, if the use cases are partial in the first place. It will give us huge relaxation in disk space.