Improve performance of result array allocation and return to R

Branch RfCWT/profile includes code to profile full C++ implementation. More than half the execution time consists of initializing the result array with and then reformatting it.

Following the pattern in pycwt and initializing a single array (in a performant manner) that can then be modified in-place and returned without changes would drastically improve performance.

Excerpts from wrapper.c:

// Allocation takes ~24% of CPU Cycles 
std::vector<complex<float>> tfm(n*fn)

// Calculation itself only uses 28% of CPU Cycles
  fcwt.cwt(&x[0], n, &tfm[0], &scs);

// Re-formatting accounts for 47% of cycles!
 ComplexVector scalogram = wrap(tfm);
  scalogram.attr("dim") = Dimension(n, fn);

google-pprof --pdf /home/matthew14786/R/x86_64-pc-linux-gnu-library/4.2/RfCWT/libs/RfCWT.so /tmp/profile.out > out.pdf

Full profiling results: out.pdf

Allocation

Questions like this one - C++ vs python numpy complex arrays performance indicated that a simple compile time flag -ffast-math might be sufficient to attain comparable performance. A quick test didn't yield any improvement, but it's not clear to me that the desired behavior was actually attained.

Result Coercion

On the coercion side, things are not highly optimistic since R does not natively support a complex float type. Rcpp Source The float package for R indicates that such a type could be implemented, but all methods to operate on such a matrix would need to be newly implemented as well.

One option could be to create the result object, persist as a pointer, and then execute some of the most common operations on the complex float matrix in memory.

msummersgill / RfCWT

Improve performance of result array allocation and return to R #3

Allocation

Result Coercion