distributed support of tensor contraction

jcmgray / cotengra

Hyper optimized contraction trees for large tensor networks and einsums

https://cotengra.readthedocs.io

Apache License 2.0

174 stars 32 forks source link

distributed support of tensor contraction #10

Closed Z-Y00 closed 3 years ago

Z-Y00 commented 3 years ago

Hi, does cotengra support tensor contraction on multi-nodes? If it does, could you add some documentation about how to run it in a cluster?

Z-Y00 commented 3 years ago

Oh, I got it, Contengra actually use quimb to do contraction.

https://quimb.readthedocs.io/en/latest/_autosummary/quimb.tensor.tensor_core.html

jcmgray commented 3 years ago

Hey @Z-Y00, yes basically cotengra is just for finding the contraction paths - and also which indices to slice - rather than providing any particular backend for the contractions.

Slicing - https://github.com/jcmgray/cotengra#basic-slicing-knife - does offer you a embarrassingly parallel strategy to distribute, something like:

# see readme for creating `sc` object
futures = [
    pool.submit(sc.contract_slice, i)
    for i in range(sc.nslices)
]
results = [f.result() for f in futures]
out = sc.gather_slices(results)

I haven't had time to really test what performs best in the context, but both dask.distributed and ray are very easy to setup and test. quimb has a very rudimentary implementation of slicing that leverages cotengra, but again, I haven't had time to test and optimize it much!

Z-Y00 commented 3 years ago

Hi, @jcmgray Thanks for your help. I'm trying to use it with dask as backend, running it on my laptop. But as you can see in this screenshot, runtime for dask is too fast. (0.3s in this case) It looks like, the dask failed to start, but exit without any error. I'm wondering if I configured any thing wrong.

jcmgray commented 3 years ago

Too slow or too fast? (0.36 vs 43sec?).

What's happening here is just that dask produces a computational graph that you run later with x.compute(), i.e. it hasn't performed the contraction yet. I'd recommend reading the docs - https://docs.dask.org/en/latest/.

I'd also not recommend actual dask arrays for the contractions, just the dask.distributed scheduler for contracting slices but still using the numpy backend.

Z-Y00 commented 3 years ago

Thanks for this information! I was expecting dask, as backend, would do everything for me. I'll try to write some test codes with numpy as backend and dask.distributed scheduler for contracting slices. Maybe I'll create a Pull Request to add some document for your repository soon!