lordmauve / wasabi2d

Cutting-edge 2D game framework for Python
https://wasabi2d.readthedocs.io/
GNU Lesser General Public License v3.0
156 stars 24 forks source link

Choose definitive vector type #17

Closed lordmauve closed 3 years ago

lordmauve commented 4 years ago

Although it doesn't use it internally, Wasabi2d re-exports Pygame's pygame.math.Vector2 class.

Vector2 is easier to use than a numpy.ndarray. It has methods for things like magnitude and normalisation. One place where it fits poorly with Wasabi2d is that it uses degrees in various methods. Wasabi2D takes the view that radians are the one true coordinate system. There are radians alternatives for some methods, but these have longer names. as_polar() and from_polar() do not have a radians version.

I also feel strongly that vectors should be immutable and hashable (eg. to use as keys in a spatial hash). pygame.math.Vector2 is not.

If we have a great candidate for a vector, it should be exposed as the return value of Transformable.pos, for example; these currently return a mutable numpy.ndarray.

larryhastings commented 4 years ago

I may soon have an entrant for consideration here! ;-)

lordmauve commented 4 years ago

I did some benchmarks:

vec

Addition: 11.27us per op (20000 samples) In-place addition: 11.56us per op (20000 samples) Dot: 4.47us per op (50000 samples) Normalized: 7.26us per op (50000 samples)

cythonised vec

Addition: 8.03us per op (50000 samples) In-place addition: 9.36us per op (50000 samples) Dot: 2.99us per op (100000 samples) Normalized: 5.14us per op (50000 samples)

pygame.math.Vector2

Addition: 0.10us per op (2000000 samples) In-place addition: 0.05us per op (5000000 samples) Dot: 0.13us per op (2000000 samples) Normalized: 0.14us per op (2000000 samples)

numpy operations

Addition: 0.58us per op (500000 samples) In-place addition: 1.01us per op (200000 samples) Dot: 1.26us per op (200000 samples) Normalized: 4.85us per op (50000 samples)

tuples

Addition: 0.12us per op (2000000 samples) In-place addition: 0.11us per op (2000000 samples) Dot: 0.11us per op (2000000 samples) Normalized: 0.23us per op (1000000 samples)

larryhastings commented 4 years ago

vec's implementation is completely unoptimized. It's a bit constrained by its API, which is designed first and foremost with the convenience of the user in mind, and with an eye towards eventually being rewritten as a C extension (without the API needing to change). I'm not surprised that pygame's vector class--which is a C extension--is enormously faster.

Did you try your benchmark with -O ? There are some asserts that would drop out, that aren't intended for production.

I wonder if there's any low-hanging fruit that would make vec, say, 2x faster.

lordmauve commented 4 years ago

Aha, with -O:

$ python -O benchmark.py                             
*** vec ***
Addition: 3.07us per op (100000 samples)
In-place addition: 2.95us per op (100000 samples)
Dot: 0.62us per op (500000 samples)
Normalized: 2.55us per op (100000 samples)

*** cythonised vec ***
Addition: 1.77us per op (200000 samples)
In-place addition: 1.80us per op (200000 samples)
Dot: 0.34us per op (1000000 samples)
Normalized: 1.68us per op (200000 samples)

pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
*** pygame Vector2 ***
Addition: 0.10us per op (2000000 samples)
In-place addition: 0.05us per op (5000000 samples)
Dot: 0.13us per op (2000000 samples)
Normalized: 0.14us per op (2000000 samples)

*** numpy ***
Addition: 0.56us per op (500000 samples)
In-place addition: 1.01us per op (200000 samples)
Dot: 1.34us per op (200000 samples)
Normalized: 4.86us per op (50000 samples)

*** tuples ***
Addition: 0.11us per op (2000000 samples)
In-place addition: 0.10us per op (2000000 samples)
Dot: 0.11us per op (2000000 samples)
Normalized: 0.23us per op (1000000 samples)
lordmauve commented 4 years ago

The code: https://gist.github.com/lordmauve/344797625700aafa15dd5c039e68d7f3

larryhastings commented 4 years ago

I assume creating new vec objects is the slow part of vec. If you're interested in an experiment, try temporarily changing the last line of vec add to return (self.x + other.x, self.y + other.y) and see what that does to the numbers.

lordmauve commented 4 years ago

It becomes

Addition: 0.61us per op (500000 samples)
lordmauve commented 4 years ago

A stab at an implementation in Rust: https://github.com/lordmauve/wvec

*** wvec ***
Addition: 0.22us per op (1000000 samples)
In-place addition: 0.21us per op (1000000 samples)
Dot: 0.27us per op (1000000 samples)
Normalized: 0.23us per op (1000000 samples)
lordmauve commented 3 years ago

Here's a port of wasabi.geom to Cython.

Cython seems more promising than Rust for this. Firstly, it's a little faster without much effort:

*** cyvec ***
Addition: 0.17us per op (2000000 samples)
In-place addition: 0.16us per op (2000000 samples)
Dot: 0.08us per op (5000000 samples)
Normalized: 0.13us per op (2000000 samples)

Second, it's actually easier. Even though you need to know Cythonisms, and some Cython lore about what is fast, it's much easier to see exactly how it maps onto the C API and therefore what Python features you're paying for. PyO3 is pretty magical by comparison.

larryhastings commented 3 years ago

I don't read Rust. So the only thing I have to say is, if you copy the broad strokes of "vec":

I'd be very pleased.

lordmauve commented 3 years ago

I'm settling this in favour of wasabigeom.vec2, which now has best-in-class performance, radians, and immutability.

*** wasabigeom.vec2 ***
Addition: 0.05us per op (5000000 samples)
In-place addition: 0.04us per op (5000000 samples)
Dot: 0.09us per op (5000000 samples)
Normalized: 0.08us per op (5000000 samples)

(As I mentioned offline to@larryhastings, the laziness in his vec class is only worthwhile if it's a speedup, which is true for pure Python and probably not true for wasabigeom.vec2 now.)