Open jeffhammond opened 7 years ago
I use Clang 4.0.1 from Homebrew:
$ /usr/local/Cellar/llvm/4.0.1/bin/clang++ --version clang version 4.0.1 (tags/RELEASE_401/final) Target: x86_64-apple-darwin16.7.0 Thread model: posix InstalledDir: /usr/local/Cellar/llvm/4.0.1/bin
Extended Description
Clang claims the reason it does not vectorize star[45] is because the trip count cannot be determined, but the trip count determination is identical in star[123], which are vectorized.
./stencil_tbb.hpp:63:7: remark: loop not vectorized: could not determine number of loop iterations [-Rpass-analysis] for (auto j=r.cols().begin(); j!=r.cols().end(); ++j ) { ^
As best I can tell, the issue is that the loop bodies for star[45] are too large. Functions with 12 or fewer terms are vectorized whereas those with 16 or more are not.