isherm_dense - Githubissues

qutip / qutip-cupy

CuPy linear-algebra backend for QuTiP

BSD 3-Clause "New" or "Revised" License

19 stars 10 forks source link

isherm_dense #32

Closed MrRobot2211 closed 3 years ago

MrRobot2211 commented 3 years ago

This defines, dispatches and tests cupydense's is_herm.

This is implemented through a cudakernel. :)

MrRobot2211 commented 3 years ago

@leofang I do not know if setting the block dimensions like this is a best practice, we probably should inspect that available number of threads in the device or something. But I think it should not break for matrices of a reasonable size.

coveralls commented 3 years ago

Pull Request Test Coverage Report for Build 1099080532

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

10 of 10 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.8%) to 84.188%

Totals
Change from base Build 1078504429:	0.8%
Covered Lines:	197
Relevant Lines:	234

💛 - Coveralls

leofang commented 3 years ago

Also, the norm and conj operation might have further optimization opportunities.

I am not sure if we have to give tol a strict semantics. If we relax it a bit, we don't have to call norm (which invokes a bit of arithmetic operations); rather, we can just compare the real parts and imaginary parts separately based on tol. Though my feeling is it's micro-optimization, and the memory access pattern matters more here (along the column the read from global memory is not contiguous).

MrRobot2211 commented 3 years ago

Also, the norm and conj operation might have further optimization opportunities.

I am not sure if we have to give tol a strict semantics. If we relax it a bit, we don't have to call norm (which invokes a bit of arithmetic operations); rather, we can just compare the real parts and imaginary parts separately based on tol. Though my feeling is it's micro-optimization, and the memory access pattern matters more here (along the column the read from global memory is not contiguous).

I think that to be in keeping with what is done in QuTiP we have to maintain the per elemnt checking

leofang commented 3 years ago

Gotta run, ping me for any questions Felipe!

MrRobot2211 commented 3 years ago

cpu times is herm gpu times is herm

This looks far better

MrRobot2211 commented 3 years ago

@hodgestar I added a simple pure cupy no kernel implementation here,

gpu times is herm cpu times is herm