Open rohany opened 3 years ago
For comparisons against other Python libraries, I think it'd make sense to include the assembly time when benchmarking so that the comparison is more apples-to-apples, particularly since assembly can take a significant amount of time when the output is sparse. (Note though that assembly time can be significantly reduced if it is done with the compute step using a fused kernel. With the C++ API at least, you should be able to tell TACO to generate code that simultaneously assembles and computes the output by invoking TensorBase::setAssembleWhileCompute(true)
before invoking compile.)
If setAssembleWhileCompute(true)
is used, then does assemble
even need to be called before compute?
If
setAssembleWhileCompute(true)
is used, then doesassemble
even need to be called before compute?
You shouldn't have to call assemble
in that case.
Should the call to
assemble()
be included in benchmark timing @stephenchouca ?