Closed MrRobot2211 closed 3 years ago
@leofang I do not know if setting the block dimensions like this is a best practice, we probably should inspect that available number of threads in the device or something. But I think it should not break for matrices of a reasonable size.
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Totals | |
---|---|
Change from base Build 1078504429: | 0.8% |
Covered Lines: | 197 |
Relevant Lines: | 234 |
Also, the
norm
andconj
operation might have further optimization opportunities.
I am not sure if we have to give tol
a strict semantics. If we relax it a bit, we don't have to call norm
(which invokes a bit of arithmetic operations); rather, we can just compare the real parts and imaginary parts separately based on tol
. Though my feeling is it's micro-optimization, and the memory access pattern matters more here (along the column the read from global memory is not contiguous).
Also, the
norm
andconj
operation might have further optimization opportunities.I am not sure if we have to give
tol
a strict semantics. If we relax it a bit, we don't have to callnorm
(which invokes a bit of arithmetic operations); rather, we can just compare the real parts and imaginary parts separately based ontol
. Though my feeling is it's micro-optimization, and the memory access pattern matters more here (along the column the read from global memory is not contiguous).
I think that to be in keeping with what is done in QuTiP we have to maintain the per elemnt checking
Gotta run, ping me for any questions Felipe!
This looks far better
@hodgestar I added a simple pure cupy no kernel implementation here,
This defines, dispatches and tests cupydense's is_herm.
This is implemented through a cudakernel. :)