Giving up on `np.matmul` will save about ~14Mb of disk space

Since we want to make xtensor headers to be ideally < 5Mb to integrate with pocketpy main. I observed the following -

04:24:51  |base|anurag_b@nirpan.org@DQW-221-51 include ±|main ✗|→ du -sh xtensor
1.9M    xtensor
04:25:13  |base|anurag_b@nirpan.org@DQW-221-51 include ±|main ✗|→ du -sh xtensor-blas/
132K    xtensor-blas/
04:25:23  |base|anurag_b@nirpan.org@DQW-221-51 include ±|main ✗|→ du -sh xflens
13M xflens
04:25:32  |base|anurag_b@nirpan.org@DQW-221-51 include ±|main ✗|→ du -sh xtl
836K    xtl

You would notice that xtensor is not that heavy in terms of disk size. Infact xflens is contributing to about 90% of the size. We are using xtensor-blas at only one point in our code, to implement the matmul function here. https://github.com/pocketpy/gsoc-2024-dev/blob/d0e6874475315f9a71d669678b4b9b71f94c1cb7/numpy/include/numpy.hpp#L236-L237

This is because xtensor does not support matmul, it supports dot in xt::linalg which is dependent on xtensor-blas which inturn depends onxflens headers.

It is mostly correct but doesn't work like numpy matmul for the following cases -

(a, b) x (b, c, d) -> (a, c, d)
(a, b, c) x (a, c, d) -> (a, b, d)

pocketpy / gsoc-2024-dev

Giving up on `np.matmul` will save about ~14Mb of disk space #46