Support efficient operations for large matrix

erizmr commented 3 years ago

Concisely describe the proposed feature Support efficient operations (e.g. matmul ) for large matrix. The current ti.Matrix is designed for small matrices, therefore the compilation is extreme slow for even a moderate large matrix.

p.s. Also an error shown in the following line is raised when materializing a large matrix:

RuntimeError: [struct_llvm.cpp:run@307] Assertion failure: (int)snodes.size() <= taichi_max_num_snodes

To reproduce:

import taichi as ti

ti.init()

a = ti.Matrix.field(36, 18, dtype=ti.f32, shape=())
b = ti.Matrix.field(18, 36, dtype=ti.f32, shape=())

@ti.kernel
def test():
    # print(b[None] @ a[None])
    print(b[None])

test()

Describe the solution you'd like (if any) Store the elements of large matrices (n*m>32) in a ti.field with large shape=(n, m) but not matrices size. Extend the current functions of ti.Matrix for large matrices based on the field.

Additional comments Any discussion and suggestions are welcome! Personally I think although the users who need a large matrix can store it in a field with large shape directly, it requires the users to implement basic matrix operations such as matmul by themselves. It is not a huge work for professional users but it may potentially degrade new user's experience when they just want to seek a quick off-the-shelf solution.

Related issues and discussions #983 #833

archibate commented 2 years ago

a = ti.Matrix.field(36, 18, dtype=ti.f32, shape=())

I think we can actually mock like this:


a_core = ti.field(dtype=ti.f32, shape=(36, 18))

class LargeMatrixWrapper:
   def __getitem__(self, indices):  # will be called when performing `a[indices]`
      return ti.Matrix([[a_core[(i, j) + indices] for j in range(18)] for i in range(36)])

a = LargeMatrixWarpper()

And for AOS/SOA of matrices, we can also mock by using multilevel of ti.root.dense properly, wdyt?

bobcao3 commented 2 years ago

This is a bit bumping into the type system refactoring that we can't avoid in the end. E.g. the fundamental question of what is a matrix and all the supported computations on a matrix. I think the main problem will be how to support efficient matrix operations / computations for these large matrices

On Mon, Oct 11, 2021, 8:21 AM 彭于斌 @.***> wrote:

a = ti.Matrix.field(36, 18, dtype=ti.f32, shape=())

I think we can actually mock like this:

a_core = ti.field(dtype=ti.f32, shape=(36, 18)) class LargeMatrixWrapper: def getitem(self, indices): # will be called when performing a[indices] return a_core[(i, j) + indices] for i, j in ndrange(a_core.shape[2:]) a = LargeMatrixWarpper()

And for AOS/SOA of matrices, we can also mock by using multilevel of ti.root.dense properly, wdyt?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/taichi-dev/taichi/issues/2696#issuecomment-940126014, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACY7Q5HWPVZHY3JSLMUMF6LUGL6F3ANCNFSM5CG2SFNA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

archibate commented 2 years ago

This is a bit bumping into the type system refactoring that we can't avoid in the end. E.g. the fundamental question of what is a matrix and all the supported computations on a matrix. I think the main problem will be how to support efficient matrix operations / computations for these large matrices … On Mon, Oct 11, 2021, 8:21 AM 彭于斌 @.***> wrote: a = ti.Matrix.field(36, 18, dtype=ti.f32, shape=()) I think we can actually mock like this: a_core = ti.field(dtype=ti.f32, shape=(36, 18)) class LargeMatrixWrapper: def getitem(self, indices): # will be called when performing a[indices] return a_core[(i, j) + indices] for i, j in ndrange(a_core.shape[2:]) a = LargeMatrixWarpper() And for AOS/SOA of matrices, we can also mock by using multilevel of ti.root.dense properly, wdyt? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2696 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACY7Q5HWPVZHY3JSLMUMF6LUGL6F3ANCNFSM5CG2SFNA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are right, the current ti.Vector is simply std::tuple<int, int, int>, not std::array<int, 3>, which is the source of compilation ineffciency. Not sure if the ti.Struct supports arrays now, e.g., C structs does:

struct Bad {
  int x0, x1, x2, x3;  // can't dynamic index, slow compilation
};
struct Good {
  int x[4];  // support dynamic index, fast compilation
};

I've heard that @k-ye had implemented dynamic index for ti.Vector, how does it work? Does it mean 'array' is a fundemental type of Taichi now?

eatcosmos commented 2 years ago

I would like to help!

lucywsq commented 2 years ago

Hi, @eatcosmos if you’re still working on this issue, please let us know, thanks!

eatcosmos commented 2 years ago

Hi, @eatcosmos if you’re still working on this issue, please let us know, thanks!

I'm sorry long time no reply. i quit this，maybe need more background knowledge，other student can help

hilbert-yaa commented 2 years ago

Hi, if @eatcosmos decides to quit, I am interested to set my hands to this issue. Could you please assign that to me if it's eligible c? : ) @lucywsq I need some directions. I read through the Python code for ti.Matrix in Taichi(./python/taichi/lang/Matrix.py), and as far as I know(pls correct me if I got something wrong) it seems that it's pure Python implementation? (e.g. operations such as ti.Matrix.matmul are implemented in nested loops of Python -> and unrolling will be applied-> slow compilation & execution). Do you think it's feasible to rewrite an alternative version where ti.Matrix is just a Python wrapper of its cpp impl. ? (I believe there's plentiful existing work e.g. in linalg.h already, and some interfaces have been done.)

archibate commented 2 years ago

an alternative version where ti.Matrix is just a Python wrapper of its cpp impl

Exactly, I saw recently ti.Struct has been added, why not add ti.Array for dynamic-indexable array too? Their cpp equivalent are std::tuple and std::array, where std::array allows dynamic indexing using a[i] and can be large but all elements has to be same type, and std::tuple only supports compile-time index like std::get<i>(a) but allows each element to be different type.

hilbert-yaa commented 2 years ago

Thanks, I'll take your advice and raise decision problems/ issues I met here if any. Plan to start working on it right this weekend. 😆

strongoier commented 2 years ago

Hi @Hilbert-Yaa. Welcome to the Taichi community! In fact, ti.Vector is on its way to become a dynamic-indexable array, and we already have TensorType in Taichi core now. Also, putting all Matrix-related implementations into Taichi core is on our current roadmap. We're glad that you would like to help. Stay in touch and we can discuss more implementation details when you're available.

hilbert-yaa commented 2 years ago

Hi @Hilbert-Yaa. Welcome to the Taichi community! In fact, ti.Vector is on its way to become a dynamic-indexable array, and we already have TensorType in Taichi core now. Also, putting all Matrix-related implementations into Taichi core is on our current roadmap. We're glad that you would like to help. Stay in touch and we can discuss more implementation details when you're available.

Thanks for your warm welcome @strongoier ! As planned I'll begin to figure things this weekend. Stay in touch 🌟.

taichi-dev / taichi

Support efficient operations for large matrix #2696