Closed lordmauve closed 3 years ago
I may soon have an entrant for consideration here! ;-)
I did some benchmarks:
Addition: 11.27us per op (20000 samples) In-place addition: 11.56us per op (20000 samples) Dot: 4.47us per op (50000 samples) Normalized: 7.26us per op (50000 samples)
Addition: 8.03us per op (50000 samples) In-place addition: 9.36us per op (50000 samples) Dot: 2.99us per op (100000 samples) Normalized: 5.14us per op (50000 samples)
Addition: 0.10us per op (2000000 samples) In-place addition: 0.05us per op (5000000 samples) Dot: 0.13us per op (2000000 samples) Normalized: 0.14us per op (2000000 samples)
Addition: 0.58us per op (500000 samples) In-place addition: 1.01us per op (200000 samples) Dot: 1.26us per op (200000 samples) Normalized: 4.85us per op (50000 samples)
Addition: 0.12us per op (2000000 samples) In-place addition: 0.11us per op (2000000 samples) Dot: 0.11us per op (2000000 samples) Normalized: 0.23us per op (1000000 samples)
vec's implementation is completely unoptimized. It's a bit constrained by its API, which is designed first and foremost with the convenience of the user in mind, and with an eye towards eventually being rewritten as a C extension (without the API needing to change). I'm not surprised that pygame's vector class--which is a C extension--is enormously faster.
Did you try your benchmark with -O ? There are some asserts that would drop out, that aren't intended for production.
I wonder if there's any low-hanging fruit that would make vec, say, 2x faster.
Aha, with -O:
$ python -O benchmark.py
*** vec ***
Addition: 3.07us per op (100000 samples)
In-place addition: 2.95us per op (100000 samples)
Dot: 0.62us per op (500000 samples)
Normalized: 2.55us per op (100000 samples)
*** cythonised vec ***
Addition: 1.77us per op (200000 samples)
In-place addition: 1.80us per op (200000 samples)
Dot: 0.34us per op (1000000 samples)
Normalized: 1.68us per op (200000 samples)
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
*** pygame Vector2 ***
Addition: 0.10us per op (2000000 samples)
In-place addition: 0.05us per op (5000000 samples)
Dot: 0.13us per op (2000000 samples)
Normalized: 0.14us per op (2000000 samples)
*** numpy ***
Addition: 0.56us per op (500000 samples)
In-place addition: 1.01us per op (200000 samples)
Dot: 1.34us per op (200000 samples)
Normalized: 4.86us per op (50000 samples)
*** tuples ***
Addition: 0.11us per op (2000000 samples)
In-place addition: 0.10us per op (2000000 samples)
Dot: 0.11us per op (2000000 samples)
Normalized: 0.23us per op (1000000 samples)
I assume creating new vec objects is the slow part of vec. If you're interested in an experiment, try temporarily changing the last line of vec add to return (self.x + other.x, self.y + other.y)
and see what that does to the numbers.
It becomes
Addition: 0.61us per op (500000 samples)
A stab at an implementation in Rust: https://github.com/lordmauve/wvec
*** wvec ***
Addition: 0.22us per op (1000000 samples)
In-place addition: 0.21us per op (1000000 samples)
Dot: 0.27us per op (1000000 samples)
Normalized: 0.23us per op (1000000 samples)
Here's a port of wasabi.geom to Cython.
Cython seems more promising than Rust for this. Firstly, it's a little faster without much effort:
*** cyvec ***
Addition: 0.17us per op (2000000 samples)
In-place addition: 0.16us per op (2000000 samples)
Dot: 0.08us per op (5000000 samples)
Normalized: 0.13us per op (2000000 samples)
Second, it's actually easier. Even though you need to know Cythonisms, and some Cython lore about what is fast, it's much easier to see exactly how it maps onto the C API and therefore what Python features you're paying for. PyO3 is pretty magical by comparison.
I don't read Rust. So the only thing I have to say is, if you copy the broad strokes of "vec":
I'd be very pleased.
I'm settling this in favour of wasabigeom.vec2, which now has best-in-class performance, radians, and immutability.
*** wasabigeom.vec2 ***
Addition: 0.05us per op (5000000 samples)
In-place addition: 0.04us per op (5000000 samples)
Dot: 0.09us per op (5000000 samples)
Normalized: 0.08us per op (5000000 samples)
(As I mentioned offline to@larryhastings, the laziness in his vec class is only worthwhile if it's a speedup, which is true for pure Python and probably not true for wasabigeom.vec2 now.)
Although it doesn't use it internally, Wasabi2d re-exports Pygame's
pygame.math.Vector2
class.Vector2 is easier to use than a
numpy.ndarray
. It has methods for things like magnitude and normalisation. One place where it fits poorly with Wasabi2d is that it uses degrees in various methods. Wasabi2D takes the view that radians are the one true coordinate system. There are radians alternatives for some methods, but these have longer names.as_polar()
andfrom_polar()
do not have a radians version.I also feel strongly that vectors should be immutable and hashable (eg. to use as keys in a spatial hash). pygame.math.Vector2 is not.
If we have a great candidate for a vector, it should be exposed as the return value of
Transformable.pos
, for example; these currently return a mutablenumpy.ndarray
.