Parallelization of ``CliffordOperations`` through `numba` and `cupy`

BrunoLiegiBastonLiegi commented 12 months ago

This PR implements the CliffordOperations object introduced in qiboteam/qibo#1076 to be used specifically with the NumbaBackend and CupyBackend. Namely, each operation is rewritten as a loop over the row indices that is parallelized using, prange and compiled with @njit(parallel=True, cache=True) for numba, and @jit.rawkernel() with cupy.

Companion of https://github.com/qiboteam/qibo/pull/1150

codecov[bot] commented 10 months ago

Codecov Report

Attention: 281 lines in your changes are missing coverage. Please review.

Comparison is base (c621acf) 100.00% compared to head (7d08f2f) 79.92%. Report is 6 commits behind head on main.

:exclamation: Current head 7d08f2f differs from pull request most recent head 6971051. Consider uploading reports for the commit 6971051 to get more accurate results

Files	Patch %	Lines
src/qibojit/backends/clifford_operations_cpu.py	19.12%	148 Missing :warning:
src/qibojit/backends/clifford_operations_gpu.py	0.00%	133 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #161 +/- ## ============================================ - Coverage 100.00% 79.92% -20.08% ============================================ Files 12 14 +2 Lines 1073 1400 +327 ============================================ + Hits 1073 1119 +46 - Misses 0 281 +281 ``` | [Flag](https://app.codecov.io/gh/qiboteam/qibojit/pull/161/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=qiboteam) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/qiboteam/qibojit/pull/161/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=qiboteam) | `79.92% <14.06%> (-20.08%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=qiboteam#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

renatomello commented 10 months ago

To be honest, I'd feel more comfortable if @stavros11 and @alecandido reviewed this PR, because these custom optimizations are definitely above my programming skills

alecandido commented 10 months ago

Btw, most likely it's a quite extreme optimization, and we should benchmark on a very tiny test case that there is an advantage at all, but here the dtype is consuming more memory than required, since each bool in NumPy is represented as an individual byte: https://github.com/qiboteam/qibo/blob/cf6246dd00380de80ee4cfcbe7152942ddab25a8/src/qibo/backends/clifford.py#L81-L83

However, you could use individual bits and bit-wise operations. NumPy also have helper functions for this: https://numpy.org/doc/stable/reference/generated/numpy.packbits.html https://numpy.org/doc/stable/reference/generated/numpy.unpackbits.html

But it could be arbitrarily painful, and you would get a fixed factor improvement in memory and possibly performances (like using single-precision floats, instead of double-precision).

(Of course, there are also the rest of binary operations, and in Numba and CuPy as well) https://numpy.org/doc/stable/reference/routines.bitwise.html https://numba.readthedocs.io/en/stable/developer/autogen_lower_listing.html#ufunc-bitwise-and https://docs.cupy.dev/en/stable/reference/binary.html

codecov-commenter commented 10 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (ce537c8) to head (95353c3).

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #161 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 12 12 Lines 1085 1085 ========================================= Hits 1085 1085 ``` | [Flag](https://app.codecov.io/gh/qiboteam/qibojit/pull/161/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=qiboteam) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/qiboteam/qibojit/pull/161/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=qiboteam) | `100.00% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=qiboteam#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

BrunoLiegiBastonLiegi commented 10 months ago

Btw, most likely it's a quite extreme optimization, and we should benchmark on a very tiny test case that there is an advantage at all, but here the dtype is consuming more memory than required, since each bool in NumPy is represented as an individual byte: https://github.com/qiboteam/qibo/blob/cf6246dd00380de80ee4cfcbe7152942ddab25a8/src/qibo/backends/clifford.py#L81-L83

However, you could use individual bits and bit-wise operations. NumPy also have helper functions for this: https://numpy.org/doc/stable/reference/generated/numpy.packbits.html https://numpy.org/doc/stable/reference/generated/numpy.unpackbits.html

But it could be arbitrarily painful, and you would get a fixed factor improvement in memory and possibly performances (like using single-precision floats, instead of double-precision).

(Of course, there are also the rest of binary operations, and in Numba and CuPy as well) https://numpy.org/doc/stable/reference/routines.bitwise.html https://numba.readthedocs.io/en/stable/developer/autogen_lower_listing.html#ufunc-bitwise-and https://docs.cupy.dev/en/stable/reference/binary.html

Yeah, I looked into packbits at some point but I wasn't sure how to use to reconcile that with all the slicing operations I do. I could try to test it on a simple operation though.

renatomello commented 10 months ago

Yeah, I looked into packbits at some point but I wasn't sure how to use to reconcile that with all the slicing operations I do. I could try to test it on a simple operation though.

This seems like a separate PR though

alecandido commented 9 months ago

Should we silence Codecov in Qibojit?

alecandido commented 9 months ago

Should we silence Codecov in Qibojit?

I had a second thought, since tests are present in Qibojit (though in an unusual location wrt the other repos), and there is a related workflow. https://github.com/qiboteam/qibojit/tree/main/src/qibojit/tests https://github.com/qiboteam/qibojit/actions/workflows/rules.yml

But I get that you wrote the related tests already in Qibo. Eventually, we will disentangle the dependencies, but this is part of the qibo-core effort (and I'd like to merge these PRs before, if possible immediately).

Before silencing Codecov, or skip tests for this PR, it could be valuable to collect the opinions of the original Qibojit authors. @scarrazza @stavros11 @andrea-pasquale

andrea-pasquale commented 9 months ago

I know that in the past there were plans to test directly on gpu through a selfhosted workflow. I believe that it should be less complicated compare to qibolab tests. If we can do so, we should be able to recover coverage, right? If it is too complicated or if we feel like that it is a bit of an overkill I am also happy to drop coverage and eventually skip tests.

BrunoLiegiBastonLiegi commented 9 months ago

@renatomello @alecandido @scarrazza is this ready to be merged? I would merge this one first and then the corresponding one in qibo.

BrunoLiegiBastonLiegi commented 9 months ago

Is the 0% coverage because it needs the PR in qibo too?

It's because the changes here are tested in qibo, as the cupy and numba clifford operations are defined here but used inside of the CliffordBackend.

renatomello commented 9 months ago

Is the 0% coverage because it needs the PR in qibo too?

It's because the changes here are tested in qibo, as the cupy and numba clifford operations are defined here but used inside of the CliffordBackend.

So then maybe they shouldn't count to coverage here (maybe # pragma: no cover everywhere or some other way to exclude entire files?)

alecandido commented 9 months ago

To be completely honest: things implemented here should be tested here (as for every other project).

Having features spread over many repos is an issue on its own. But we are starting facing that issue with https://github.com/qiboteam/qibo/issues/1117 and https://github.com/qiboteam/qibo-core/, so I will not insist in this PR.

BrunoLiegiBastonLiegi commented 9 months ago

To be completely honest: things implemented here should be tested here (as for every other project).

Having features spread over many repos is an issue on its own. But we are starting facing that issue with qiboteam/qibo#1117 and https://github.com/qiboteam/qibo-core/, so I will not insist in this PR.

I agree, but this would involve copy pasting the tests that are currently implemented in qibo here. If you think it's better I can do that, it was just to avoid code duplication.

alecandido commented 9 months ago

I agree, but this would involve copy pasting the tests that are currently implemented in qibo here. If you think it's better I can do that, it was just to avoid code duplication.

I believe the best would be to move all this code to Qibo, possibly to the quantum_info module, where Clifford is defined in the first place. But let's merge this PR, and consider this an optimization.

qiboteam / qibojit

Parallelization of ``CliffordOperations`` through `numba` and `cupy` #161

Codecov Report

Codecov Report