tbenthompson / cutde

Python CPU and GPU accelerated TDEs, over 100 million TDEs per second!
MIT License
54 stars 14 forks source link

Return ACA outputs in a packed format. #13

Open tbenthompson opened 3 years ago

tbenthompson commented 3 years ago

See here: https://tbenthompson.com/book/tdes/hmatrix.html#faster-approximate-blocks-on-gpus-with-cutde

approx_max_rank = np.max([U.shape[1] for U, _ in approx_blocks_gpu])
UVflattened = [np.concatenate((V.flatten(), U.flatten())) for U, V in approx_blocks_gpu]
approx_block_starts = np.empty(len(UVflattened) + 1, dtype=np.int64)
approx_block_starts[0] = 0
approx_block_starts[1:] = np.cumsum([arr.size for arr in UVflattened])
approx_packed_blocks = np.concatenate(UVflattened)