pygame-community / pygame-ce

🐍🎮 pygame - Community Edition is a FOSS Python library for multimedia applications (like games). Built on top of the excellent SDL library.
https://pyga.me
853 stars 140 forks source link

Change/Expand `Rect.collideobjectsall` for optimization #2787

Open itzpr3d4t0r opened 6 months ago

itzpr3d4t0r commented 6 months ago

The current Rect.collideobjectsall() method is slow, like really slow, especially when compared with all the other multicollision methods: image

After analysis, it became evident that the bulk of the runtime is attributed to parsing and invoking the key argument, which only accepts a callable. Strikingly, this is also the slowest scenario:

from pygame import Rect

class Obj:
    def __init__(self, x, y, w, h):
        self.xa = Rect(x, y, w, h)

r = Rect(0, 0, 100, 100)
objs = [
   Obj(
            randint(-100, -100),
            randint(-100, 100),
            randint(-100, 100),
            randint(-100, 100),
        )
        for _ in range(100)
]

colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)

Surprisingly, invoking a lambda function to access an object's attribute in C is markedly slower than passing the attribute's name as a string directly and retrieving the attribute.

Hence, I propose either of the following solutions:

In practice, this would allow the following alternatives:

colliding_objs = r.collideobjectsall(objs, key='xa')
# OR
colliding_objs = r.collideobjectsall(objs, from_attr='xa')

Either of these would work and make it possible to massively optimize the algorithm. From my testing this change alone brings a 190% improvement on average: image

PS. this is also true for .collideobjects() since it shares most of this function's behaviour.

ankith26 commented 6 months ago

Behaviour Change: Expand the key keyword argument to accept strings as well, enabling direct attribute retrieval in such instances.

I would be okay with this, as it wouldn't break existing code. Callables should continue to be handled the way they are, we are just adding an extra string handle path.

Starbuck5 commented 6 months ago

I'm skeptical of adding API surface like this for optimization. It's more branches, more code to test, more to maintain, and it provides no benefit to existing users. Especially since this is a niche function.

Plus, I've found a far easier solution--

Rewrite it in Python.

from pygame import Rect
import random
import time

random.seed(36)

use_python = False

class Obj:
    def __init__(self, x, y, w, h):
        self.xa = Rect(x, y, w, h)

r = Rect(-20, -20, 100, 100)
objs = [
   Obj(
            random.randint(-100, -100),
            random.randint(-100, 100),
            random.randint(-100, 100),
            random.randint(-100, 100),
        )
        for _ in range(5000)
]

start = time.time()

if use_python:
    for _ in range(1000):
        colliding_indices = r.collidelistall([obj.xa for obj in objs])
        colliding_objs = [objs[colliding_index] for colliding_index in colliding_indices]

else:
    for _ in range(1000):
        colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)

print(time.time() - start)

use_python = True does the same thing as collideobjectsall but is more than twice as fast. 0.14 seconds versus 0.34 seconds, Python 3.12.

itzpr3d4t0r commented 6 months ago

Some thoughts:

itzpr3d4t0r commented 6 months ago

use_python = True does the same thing as collideobjectsall but is more than twice as fast. 0.14 seconds versus 0.34 seconds, Python 3.12.

Your program on my machine (Python 3.12):

So my proposed change is still 77.5% faster than the python alternative while not requiring more python.

Starbuck5 commented 6 months ago

You don’t get it? We should replace our implementation of this function with a Python one. There’s no reason our implementation has to be written in C. Might be a little tricky, but we’ve done a couple similar things before.

This improves our implementation’s speed for existing code by taking a complex C function and rewriting it in 2 lines of Python. This would be a huge win.

It’s faster on 3.9 too, but it’s more faster on 3.12. And maybe even better with future versions!

itzpr3d4t0r commented 6 months ago

Alright, Hopefully someone will do this eventually. Actually your idea of using collidelistall will get even faster with my PR: https://github.com/pygame-community/pygame-ce/pull/2786 so the future is bright for a python implementation. Now that we're on this, should i remove the optimization for collideobjects/collideobjectsall in this PR or no? I guess a ltl speedup with no drawbacks isn't really an issue but i'm not rly attacched to it or anything so it's whatever.

Starbuck5 commented 6 months ago

Hmm, my example was unrealistic because I didn't show how it could be made compatible with the existing arguments. It didn't use the lambda function.

This is more realistic, but doesn't show nearly as good of a speedup:

from pygame import Rect
import random
import time

random.seed(36)

use_python = True

class Obj:
    def __init__(self, x, y, w, h):
        self.xa = Rect(x, y, w, h)

r = Rect(-20, -20, 100, 100)
objs = [
   Obj(
            random.randint(-100, -100),
            random.randint(-100, 100),
            random.randint(-100, 100),
            random.randint(-100, 100),
        )
        for _ in range(5000)
]

start = time.time()

def collideobjectsall(objects, key):
    colliding_indices = r.collidelistall([key(obj) for obj in objects])
    return [objects[colliding_index] for colliding_index in colliding_indices]

if use_python:
    for _ in range(1000):
        collideobjectsall(objs, key=lambda e: e.xa)

else:
    for _ in range(1000):
        colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)

print(time.time() - start)

It is still faster to do it in Python in Python 3.12, but not in Python 3.9.

Python 3.9 "use_python"            0.36 s
Python 3.9 C implementation        0.31 s
Python 3.12 "use_python"           0.26 s
Python 3.12 C implementation       0.34 s

Also good point these will all be sped up with your optimization PR, even the Python version.

I'm bummed I didn't test it correctly and it's not as good as I thought, but I still think it's a good opportunity to pursue.

itzpr3d4t0r commented 6 months ago

I mean I guess we could drop the python 3.12 version down to about 0.15 with the optimization but if optimizing C still gives us 0.08s of runtime AND it's python version independent then i think it's better to bite the bullet and implement the C version. I still think you're making it look worse than it actually is on the C side but these are my 2 cents anyway.

Starbuck5 commented 1 month ago

I've managed to get a good improvement inside the current status quo here by using vectorcall: https://github.com/pygame-community/pygame-ce/pull/3048