Closed jules-vanaret closed 4 weeks ago
I have realized that broadcast_to
was also used as a check that the input symbol
parameter had the right shape if it was given as a list or an array. I have added it back (not only for unique symbols).
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 92.44%. Comparing base (
1dcbade
) to head (762426e
). Report is 2 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This looks to be addressing: https://github.com/napari/napari/issues/6275 Have you checked with vispy main https://github.com/vispy/vispy/pull/2533 ?
Thanks for the fantastic PR description @jules-vanaret, and for your attention on the points layer 👀
This looks to be addressing: https://github.com/napari/napari/issues/6275
@psobolewskiPhD yep I'd say you're right. Based on the profiling and discussion in that thread it looks like both the vispy.set_data
and our own coerce_symbols
could be improved. Unless vispy/#2533 removes the need for us to coerce_symbols
, it sounds like this PR would be an improvement regardless of the vispy PR?
Thank you both for taking a look! @psobolewskiPhD all the tests were made on a fresh environment, with Vispy 0.14.1, which I believe is the most recent released version. Do you think a new version of Vispy in the near future will make this PR obsolete ?
@brisvag Did you know what is required to release vispy 0.14.2?
Nice work, I was never happy about this code ^^' @Czaki releasing a patch version for vispy can be done whenever we like I think.
So maybe you could make release, and we could check how this PR works with the new vispy?
Vispy 0.14.2 released :)
FYI I ran the script from the OP (n= 10 mil) with newest vispy and psygnal and I get:
Creating layer with 10000000 vertices: 11.0905 seconds
Creating layer with 10000000 vertices and as many symbols: 12.3066 seconds
Creating layer with 10000000 vertices and a Symbol instance: 7.7045 seconds
Virtually the same prior to updating them. However with this PR:
Creating layer with 10000000 vertices: 0.9560 seconds
Creating layer with 10000000 vertices and as many symbols: 1.3793 seconds
Creating layer with 10000000 vertices and a Symbol instance: 0.8674 seconds
Adding a layer using viewer.add_points(data)
(data = rng.random(size=(10_000_000, 2)) * 1000
) takes <10 s, which is >10s faster than the same on main.
Whoops you beat me to it ! :) I report the same orders of magnitude on my end with the newest Vispy.
@Czaki should I run additional benchmarks or add comments to the code ?
I will try to review tomorrow. Please resolve the conflicts
@Czaki any further comments on this one?
Non-Qt benchmarks failed because one benchmark in the ImageSuite was slower (I don't know how this PR affects images)... For Qt benchmarks, I don't understand the error. Is it due to the benchmarks taking too long ?
I've re-run the non-Qt ones. Looked like a glitch in there, where suddenly a few jumped to >1s.
It looks strange. I have merged main into this PR. I have also run benchmarks in another PR to check how it looks.
Hm. Problem of timeout of run benchmark seams to disappear. Running benchmarks locally do not show slowdown.
Let's wait on this run of benchmarks, but I hope That I could add ready to merge
label in a few hours.
The Qt benchmark run requested by PR #6727 (e37f69c7bdacc85cdf683bd6978990c2e0e6e465 vs 2a2205218ac1795c6a59b06c63df6afac016bdf2) has finished with status 'failure'. See the CI logs and artifacts for further details. The non-Qt benchmark run requested by PR #6727 (e37f69c7bdacc85cdf683bd6978990c2e0e6e465 vs 2a2205218ac1795c6a59b06c63df6afac016bdf2) has finished with status 'failure'. See the CI logs and artifacts for further details.
The Qt benchmark run requested by PR #6727 (f41f4862080d7a63c5b62537225b914cf8169ec6 vs 2a2205218ac1795c6a59b06c63df6afac016bdf2) has finished with status 'failure'. See the CI logs and artifacts for further details. The non-Qt benchmark run requested by PR #6727 (f41f4862080d7a63c5b62537225b914cf8169ec6 vs 2a2205218ac1795c6a59b06c63df6afac016bdf2) has finished with status 'failure'. See the CI logs and artifacts for further details.
The Qt benchmark run requested by PR #6727 (f41f4862080d7a63c5b62537225b914cf8169ec6 vs 2a2205218ac1795c6a59b06c63df6afac016bdf2) has finished with status 'failure'. See the CI logs and artifacts for further details. The non-Qt benchmark run requested by PR #6727 (f41f4862080d7a63c5b62537225b914cf8169ec6 vs 2a2205218ac1795c6a59b06c63df6afac016bdf2) has finished with status 'failure'. See the CI logs and artifacts for further details.
It looks like a global translation dict solve the performance problems:
| Change | Before [2a220521] | After [f41f4862] | Ratio | Benchmark (Parameter) |
|----------|----------------------|---------------------|---------|----------------------------------------------------------------|
| - | 16.2±0.2ms | 9.19±0.1ms | 0.57 | benchmark_points_layer.Points3DSuite.time_create_layer(4096) |
| - | 15.5±0.3ms | 8.24±0.04ms | 0.53 | benchmark_points_layer.Points2DSuite.time_create_layer(4096) |
| - | 40.8±0.9ms | 12.5±0.06ms | 0.31 | benchmark_points_layer.Points2DSuite.time_create_layer(16384) |
| - | 40.8±0.3ms | 12.5±0.1ms | 0.31 | benchmark_points_layer.Points3DSuite.time_create_layer(16384) |
| - | 141±2ms | 30.0±0.6ms | 0.21 | benchmark_points_layer.Points2DSuite.time_create_layer(65536) |
| - | 138±0.9ms | 27.4±0.6ms | 0.2 | benchmark_points_layer.Points3DSuite.time_create_layer(65536) |
| - | 1.66±0.01s | 62.4±1ms | 0.04 | benchmark_python_layer.CoerceSymbolsSuite.time_coerce_symbols1 |
| - | 1.66±0.01s | 63.3±0.5ms | 0.04 | benchmark_python_layer.CoerceSymbolsSuite.time_coerce_symbols2 |
@jules-vanaret are you ok with these changes?
For typing, I prefer to wait on #6866 as it will bump mypy and current output is false negative.
Sorry for the delay. Thank you for your work @Czaki, precomputing the dict makes much more sense ! Some tests are still failing though.
Yes, but I prefer to merge first #6866, as it will impact failing test.
The Qt benchmark run requested by PR #6727 (0b86e447346d74eaf37e1fbb2cb1fedab5a049ba vs 1dcbadea20149c43d49764afd3fe8b39b76998a9) has finished with status 'success'. See the CI logs and artifacts for further details. The non-Qt benchmark run requested by PR #6727 (0b86e447346d74eaf37e1fbb2cb1fedab5a049ba vs 1dcbadea20149c43d49764afd3fe8b39b76998a9) has finished with status 'success'. See the CI logs and artifacts for further details.
@jules-vanaret Could you check current state? I have cleanup code a little and tweak documentation. For me, it is close to merging.
@psobolewskiPhD could you also check?
Everything still looks good to me. it's even faster:
Creating layer with 1000000 vertices: 0.1736 seconds
Creating layer with 1000000 vertices and as many symbols: 0.3118 seconds
Creating layer with 1000000 vertices and a Symbol instance: 0.0992 seconds
I just rechecked the code and some cases are indeed faster (cases that are not are marginally so). From my point of view, everything seems ok!
Thanks @jni for merging the PR! :) And thank you @brisvag for the Vispy push, @psobolewskiPhD for the thorough benchmarking, and particularly @Czaki for all your work on optimization, clean up, and general coordination of the PR.
References and relevant issues
I work with large points datasets (~10⁶ vertices), and creating/setting data of a Points layer is rather slow, on the order of 2 seconds on my machine). I have dug into the code and found that the culprit is this function to handle symbols associated to points in
napari/layers/points/_points_utils.py
, which takes around 80% of the full computation time:The purpose of
coerce_symbols
is to transform "raw" (e.g string like 'o') points symbols into properSymbol
instances.Note that even if
symbol
is not specified when creating the Points layer (and then it is just equal to 'o'), or specified with a unique symbol for all points, thesymbol
setter innapari/layers/points/points.py
:immediately transform this unique symbol into a potentially huge numpy array of size
N_vertices
.The main reason for the fact that
coerce_symbols
is slow in the case of largeN_vertices
is because it loops twice through this huge numpy array of (potentially simply duplicated) symbols: once with(str(x).lower() for x in array)
and more or less once withnp.vectorize(Symbol, otypes=[object])(array)
. I believe this can be improved with minor changesDescription
The PR stems from two ideas to reduce computation time.
Symbol
instance, it is much faster to first convert it to a properSymbol
instance and THEN to broadcast it to an array of sizeN_vertices
.N_vertices
symbols for the points (composed of strings orSymbol
instances), there are not that many valid symbols (currently 14 different ones) that they can choose from. This means that instead of converting each string symbol into a Symbol instance individually, it is faster to first find all unique raw symbols fromsymbol
provided by the user, convert them in toSymbol
, and then simply map each raw symbol to its correspondingSymbol
instance.The new
coerce_symbols
function I propose looks like this:Instead of operating on an already-broadcasted array of symbols, it can take as input either a single symbol, or a list of symbols.
The
symbol
setter innapari/layers/points/points.py
is modified to look like this:It now tries to broadcast symbols to the number of vertices AFTER converting them to proper
Symbol
instances.Finally,
coerce_symbols
is also used in thecurrent_symbol
setter, but this new implementation does not interfere with its current behavior.Performance changes
Qualitative performance changes can be assessed using the following script, which simulates a random points dataset with
N_vertices
vertices.N_vertices
, I time the following duration to define the Points layer:N_vertices = 10_000
N_vertices = 1_000_000
N_vertices = 10_000_000
Note that the proposed implementation has a much nicer scaling with the number of vertices.
Benchmarks
From what I have been able to see, the current benchmarking suite already computes the time it takes to set the data of a Point layer, so there is no need to test for anything else.
Final Checklist