tensor-compiler / array-programming-benchmarks

MIT License
3 stars 0 forks source link

potential leaks in benchmarking code #20

Closed rohany closed 3 years ago

rohany commented 3 years ago

I'm running into OOMs when attempting to run benchmarks on some of the larger FROSTT tensors (flikr and larger) as well as running on all of the ~3000 suitesparse matrices in a single run. I believe that these tensors should all fit into the 128 GB of memory available on lanka, so I think that something is leaking. @stephenchouca and @weiya711 do you mind looking at the ufunc benchmarks and their setup to see if anything stands out as a potential culprit?

rohany commented 3 years ago

Some errors I see are like this :

terminate called after throwing an instance of 'taco::TacoException'
  what():  Error at /data/scratch/rohany/array-programming-benchmarks/taco/taco/src/codegen/module.cpp:147 in compile:
 Compilation command failed:
cc -O3 -ffast-math -std=c99 -shared -fPIC /tmp/taco_tmp_JLmuiF/5nagsmwha9c9.c  -o /tmp/taco_tmp_JLmuiF/5nagsmwha9c9.so -lm 2> /tmp/taco_tmp_JLmuiF/5nagsmwha9c9.outlog
returned -1

The compilation command fails with no output. I adjusted the command to attempt to redirect stderr to a file to read later, but that is also empty.

stephenchouca commented 3 years ago

I'm not sure about the issue with larger FROSTT tensors, but with regards to the issue involving SuiteSparse matrices perhaps the code caching mechanism might have something to do with it? In particular, I've noticed that the temporary fix that checks whether custom operators are isomorphic (line 535 in index_notation.cpp of the array_algebra branch) seems to always return false, so the code caching mechanism is essentially disabled. Additionally, the code caching mechanism only works if the input sizes are the same, so even if you're benchmarking the same kernel it likely wouldn't kick in if you're running with different SuiteSparse matrices. TACO might be running out of memory storing loaded code as a result.

rohany commented 3 years ago

Additionally, the code caching mechanism only works if the input sizes are the same, so even if you're benchmarking the same kernel it likely wouldn't kick in if you're running with different SuiteSparse matrices. TACO might be running out of memory storing loaded code as a result.

I see. I can see if a run completes if we don't store anything in the cache.

rohany commented 3 years ago

That didn't seem to help. I think that we're going to have to either not use googlebench for this usecase, or set up the benchmarks in an outer driver script so that each tensor in the dataset runs individually, and then we aggregate the results into CSV's separately.

stephenchouca commented 3 years ago

Having a driver script and running each tensor individually might be the safer thing to do, since it could be difficult to ensure that there isn't any memory leak in the benchmarks.

rohany commented 3 years ago

Used a driver script to do this.