Closed rohany closed 3 years ago
Some errors I see are like this :
terminate called after throwing an instance of 'taco::TacoException'
what(): Error at /data/scratch/rohany/array-programming-benchmarks/taco/taco/src/codegen/module.cpp:147 in compile:
Compilation command failed:
cc -O3 -ffast-math -std=c99 -shared -fPIC /tmp/taco_tmp_JLmuiF/5nagsmwha9c9.c -o /tmp/taco_tmp_JLmuiF/5nagsmwha9c9.so -lm 2> /tmp/taco_tmp_JLmuiF/5nagsmwha9c9.outlog
returned -1
The compilation command fails with no output. I adjusted the command to attempt to redirect stderr to a file to read later, but that is also empty.
I'm not sure about the issue with larger FROSTT tensors, but with regards to the issue involving SuiteSparse matrices perhaps the code caching mechanism might have something to do with it? In particular, I've noticed that the temporary fix that checks whether custom operators are isomorphic (line 535 in index_notation.cpp of the array_algebra branch) seems to always return false, so the code caching mechanism is essentially disabled. Additionally, the code caching mechanism only works if the input sizes are the same, so even if you're benchmarking the same kernel it likely wouldn't kick in if you're running with different SuiteSparse matrices. TACO might be running out of memory storing loaded code as a result.
Additionally, the code caching mechanism only works if the input sizes are the same, so even if you're benchmarking the same kernel it likely wouldn't kick in if you're running with different SuiteSparse matrices. TACO might be running out of memory storing loaded code as a result.
I see. I can see if a run completes if we don't store anything in the cache.
That didn't seem to help. I think that we're going to have to either not use googlebench for this usecase, or set up the benchmarks in an outer driver script so that each tensor in the dataset runs individually, and then we aggregate the results into CSV's separately.
Having a driver script and running each tensor individually might be the safer thing to do, since it could be difficult to ensure that there isn't any memory leak in the benchmarks.
Used a driver script to do this.
I'm running into OOMs when attempting to run benchmarks on some of the larger FROSTT tensors (flikr and larger) as well as running on all of the ~3000 suitesparse matrices in a single run. I believe that these tensors should all fit into the 128 GB of memory available on lanka, so I think that something is leaking. @stephenchouca and @weiya711 do you mind looking at the ufunc benchmarks and their setup to see if anything stands out as a potential culprit?