CUDA Benchmarking - Githubissues

numba / numba

NumPy aware dynamic Python compiler using LLVM

https://numba.pydata.org/

BSD 2-Clause "Simplified" License

9.61k stars 1.11k forks source link

CUDA Benchmarking #7612

Open gmarkall opened 2 years ago

gmarkall commented 2 years ago

There is presently no benchmark suite for Numba’s CUDA target, and there is a gap between Numba’s performance and the maximum achievable. To support performance optimization efforts, a benchmark suite is needed that:

[ ] Runs regularly (with CI, perhaps before merge)
[ ] Measures compilation time - this has been drifting upwards with each release.
[ ] Measures kernel launch time - this is known to be slow in comparison to CUDA C/C++.
[ ] Contains a set of micro-benchmarks to spot existing opportunities for optimization and avoid regressions.
[ ] Benchmarks common real-world workloads (e.g. ETL operations, custom filters, etc.)

quasiben commented 2 years ago

@pentschev and myself have built infrastructure similar to what is being asked for here:

These were designed to run nightly and push a public GH issue. We liked this model because it's public and relatively low noise with high impact for noticing regressions for a once a day viewing

gmarkall commented 2 years ago

A useful benchmark for kernel launch time is here: https://github.com/numba/numba/issues/3003#issuecomment-627872661

pentschev commented 2 years ago

To add to @quasiben 's comment, the thing that can't be done is running before merging as it would need access to the repo, which we don't do today for UCX-Py. For that maybe we could check whether we have the resources for that in gpuCI, similar to what has been done in Dask, what do you think @quasiben ?

gmarkall commented 2 years ago

The benchmark in the following comment could probably be used with tweaking for general measurement, and comparison with CuPy's JIT: https://github.com/numba/numba/issues/4647#issuecomment-537328981

gmarkall commented 2 years ago

the thing that can't be done is running before merging as it would need access to the repo

Why wouldn't the repo be accessible? I'm guessing I'm missing some understanding here?

pentschev commented 2 years ago

Why wouldn't the repo be accessible? I'm guessing I'm missing some understanding here?

Sorry, I didn't mean it can't be done, but rather that you would need specific permissions from the GH API/GH Actions to query each new open PR/run tests on it, like gpucibot has for all RAPIDS projects. The infrastructure mentioned in https://github.com/numba/numba/issues/7612#issuecomment-984669572 has no special rights to any repos, so it won't do any of those things today.

gmarkall commented 2 years ago

The infrastructure mentioned in #7612 (comment) has no special rights to any repos, so it won't do any of those things today.

Ah, I see - many thanks for the clarification!