Closed tmxklzp closed 1 year ago
Hi @tmxklzp, which Taichi version do you use? The build time is shortened in this PR. Besides, the build time decreases significantly when you run the script for the second time because of the offline cache.
@FantasyVR Yes I solved this by building from taichi source of exactly the commit of the PR you mentioned.
Firstly I found the released 1.3.0 version in Pypi(commit tag: rc-v1.3.0, commit id: 0f25b95e) not have the commit of the PR(commit id: 8413bc2):
$ git merge-base 8413bc2 --is-ancestor 0f25b95e && echo yes || echo no
no
So I tried to build from source of the master branch, which contains the PR. But the build version of taichi could not run the script, the log shows:
[E 01/11/23 17:45:25.542 8382] [dynamic_loader.cpp:load_function@30] Cannot load function: /usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so: undefined symbol: cusparseSsctr
I think there maybe another bug but I don't dig into this...
Then I check out to the exactly commit 8413bc2 and build from source again, this time it runned correctly:
[Taichi] version 1.3.0, llvm 15.0.4, commit 8413bc22, linux, python 3.10.8
[Taichi] Starting on arch=cuda
fill time 0.14100861549377441
build time 0.5260381698608398
sparse matrix multiply time 0.0055048465728759766
solver compute time 1.840867519378662
solve time 0.0001671314239501953
And I have another question: Does the offline cache really work? I rerun the program(for several times) and it costs:
[Taichi] version 1.3.0, llvm 15.0.4, commit 8413bc22, linux, python 3.10.8
[Taichi] Starting on arch=cuda
fill time 0.060143232345581055
build time 0.2373356819152832
sparse matrix multiply time 0.002096414566040039
solver compute time 0.5183939933776855
solve time 0.00011801719665527344
the comparison:
# first time
# rerun
build time 0.5260381698608398
build time 0.2373356819152832
solver compute time 1.840867519378662
solver compute time 0.5183939933776855
The offline_cache is set to be True, and I noticed the description in the doc: "offline_cache: Enable/disable offline cache of the compiled kernels", but the value of sparse matrix is randomly setted after exceute. So why the compiled kernels benefit the exceute time of the build time and the solver compute time? Is it the reason that the sparse matrix took the same shape and number of tripets?
Hi @tmxklzp, if the offline_cache
is enabled, you don't need to recompile the python script. You can directly load the compiled compilation results. The saved time is the compilation time. It actually should not have much influence on the build time and solve time.
@FantasyVR Okay I got it. Thank you for helping me!
Describe the bug the build method of SparseMatrixBuilder takes a long time on the CUDA arch, but looks good on the CPU arch
To Reproduce
Log/Screenshots The full log of the program:
if I changed arch = ti.cuda to arch = ti.cpu:
the build time on CUDA is significantly longer and the solver compute time is also longer than on CPU: