pygame-community / pygame-ce

🐍🎮 pygame - Community Edition is a FOSS Python library for multimedia applications (like games). Built on top of the excellent SDL library.
https://pyga.me
910 stars 148 forks source link

New Surface.ublits() function for blitting uniform Surface sequences (3258) #1622

Closed GalacticEmperor1 closed 1 year ago

GalacticEmperor1 commented 1 year ago

Issue №3258 opened by itzpr3d4t0r at 2022-06-25 18:25:58

Follows # 3257 .

Rationale

The Surface.blits() function is exceptional at providing freedom to the user as to how a sequence of surfaces are blit, letting them decide on a per-surface basis whether to draw just a portion of them, using a different blend-type or both. It's a very good general function, but as being so, it also pays the cost of "checking what we have and choose what to do" on a per surface basis. We need something stricter, yet more performant.

Idea

What I propose is a narrower blits() function, that I'll call ublits(), where the "u" stands for uniform, that will draw every Surface in the sequence fully (meaning area=None) while using the same blend-mode for each one, therefore having a more delimited behaviour, but with more performance overall.

Implementation

I'll start from the general to then go more in depth:

From the user perspective

The final user would implement something that follows the "signature":

ublits(((source, dest), ...), blend_flags=None) -> [Rect, ...]
or
ublits([(source, dest), ...], blend_flags=None) -> [Rect, ...]

So we want:

Current blits() slowdowns

The current blits() implementation : https://github.com/pygame/pygame/blob/d31eae2c804592a1b1875122c139e209055fa90e/src_c/surface.c# L1902 has to do the following for each and every item from the sequence:

So to summarize there's a lot of getting, setting and checking for every surface in the sequence. All this with a specialized version of blits() would not be needed, not for each surface at least.

Potential ublits() implementation

The proposed implementation is basically a stripped down version of the current blits() function:

Right now

PyObject *argrect = NULL;  //line 1906
...
while((item = PyIter_Next(iterator))) {
    argrect = NULL;  // line 1953
    ...
    if (argrect && argrect != Py_None) {  // line 2004
            if (!(src_rect = pgRect_FromObject(argrect, &temp))) {
                bliterrornum = BLITS_ERR_INVALID_RECT_STYLE;
                goto bliterror;
            }
        }
    else {
        temp.x = temp.y = 0;
        temp.w = src->w;
        temp.h = src->h;
        src_rect = &temp;
     }
  ...
  Py_XDECREF(argrect);  // line 2049
  ...
  argrect = NULL;  //  line 2054
}

Becomes

PyObject *argrect = NULL;
...
while((item = PyIter_Next(iterator))) {
    ...
    temp.x = temp.y = 0;
    temp.w = src->w;
    temp.h = src->h;
    src_rect = &temp;
  ...
}

Right now

PyObject *special_flags = NULL;
...
while((item = PyIter_Next(iterator))) {
    ...
    if (special_flags) {  // line 2022
          if (!pg_IntFromObj(special_flags, &the_args)) {
              bliterrornum = BLITS_ERR_MUST_ASSIGN_NUMERIC;
              goto bliterror;
          }
      }
  ...
}

Becomes

PyObject *special_flags = blend_flags;  // blend flags are the blend_flags passed to the func.'s kwargs params. i don't know how to get it using python funcs but that's what i mean.
...
if (special_flags) {
          if (!pg_IntFromObj(special_flags, &the_args)) {
              bliterrornum = BLITS_ERR_MUST_ASSIGN_NUMERIC;
              goto bliterror;
          }
      }
while((item = PyIter_Next(iterator))) {
    ...
}

And i'm sure there are other things that can benefit some love but i still don't know enough python C functions to say...

A hard topic

I'll throw it out there, currently if we want to return a list of changed rects we can, but this is another thing we check for each surface and it's unnecessary imo. we could further speed up the code by removing these evaluations and moving them to a separate function for getting the rects. But i understand that that would mean too many functions to maintain and possibly update.


Comments

*MyreMylar commented at 2022-06-26 09:47:09*

I'm reading these issue posts, just to let you know.

I'm planning to do another round of SIMD performance optimisations on some of the more 'core' blitting code in a future version of pygame once the current changes adding AVX2 to the special blend modes have had a chance to bed in (i.e. they've been tested in general use for a good while).

I may look into blits/ublits then as well - but if you want to try a PR yourself (with some accompanying performance testing) then you are also welcome to do that. I tend to work fairly slowly these days as I have lots keeping me busy.


*itzpr3d4t0r commented at 2022-06-26 09:51:42*

@MyreMylar Yes sure, i already tried messing up with the code and ended up with a working function, but i still have to check whether the returned list of rects is correct.


*itzpr3d4t0r commented at 2022-06-26 09:53:25*

What i intended to have is a somewhat simpler to use function that is at least equal if not a little better than blits().


*Starbuck5 commented at 2022-06-26 21:34:14*

If you're trying to do a "restricted" blit function for higher performance, you could remove keyword arguments and use METH_FASTCALL as your type of python function. (Removing keywords because there isn't a good way to parse keywords with METH_KEYWORDS and FASTCALL together.

Currently you have ublits using METH_VARARGS | METH_KEYWORDS.

See: https://docs.python.org/3/c-api/structures.html# METH_FASTCALL


*itzpr3d4t0r commented at 2022-06-27 06:13:31*

I've tried implementing it but there must be something i'm missing here...

{"ublits", (PyCFunction)surf_ublits, METH_FASTCALL,
     DOC_SURFACEUBLITS},
...
static PyObject *
surf_ublits(pgSurfaceObject *self, PyObject *args)
{
...
if (!PyArg_ParseTuple(args, "OOi", &blitsequence, &special_flags,
                          &doreturn))
        return NULL;

What am i missing?


*Starbuck5 commented at 2022-06-27 06:22:43*

METH_FASTCALL doesn't give you a pyobject args pointer. It gives you a c array of pyobjects and a py_ssize_t number of arguments in said array.


*itzpr3d4t0r commented at 2022-06-27 06:37:54*

If you're trying to do a "restricted" blit function for higher performance, you could remove keyword arguments and use METH_FASTCALL as your type of python function. (Removing keywords because there isn't a good way to parse keywords with METH_KEYWORDS and FASTCALL together.

Currently you have ublits using METH_VARARGS | METH_KEYWORDS.

See: https://docs.python.org/3/c-api/structures.html# METH_FASTCALL

Nevermind i found the issue, but i get Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, pygame.base, pygame.constants, pygame.rect, pygame.rwobject, pygame.surflock, pygame.bufferproxy, pygame.math, pygame.surface, pygame.display, pygame.draw, pygame.event, pygame.imageext, pygame.image, pygame.joystick, pygame.key, pygame.mouse, pygame.time, pygame.mask, pygame.pixelcopy, pygame.transform, pygame.font, pygame.mixer_music, pygame.mixer, pygame.scrap, pygame.context, matplotlib._image, pygame._freetype (total: 44) this when i run ublits

Ok now i have something like this:

{"ublits", (_PyCFunctionFast)surf_ublits, METH_FASTCALL,
     DOC_SURFACEUBLITS},
...
static PyObject *
surf_ublits(pgSurfaceObject *self, PyObject *const *args, Py_ssize_t nargs)
{
...
blitsequence = &((*args)[0]);
special_flags = &((*args)[1]);
pg_IntFromObj((*args)[2], &doreturn);

and it's still not working, it says:

src_c/surface.c(2139): error C2440: 'function': impossibile to convert from 'PyObject' to 'PyObject *' src_c/surface.c(2139): warning C4024: 'function through pointer': different types between formal parameter 1 and the effective one


*Starbuck5 commented at 2022-06-27 06:50:23*

blitsequence = &((*args)[0]);
special_flags = &((*args)[1]);

->

blitsequence = args[0];
special_flags = args[1];

You're trying to get pyobject pointers out of an array of pyobject pointers, index the array.

Also, this would be a great thing to discuss on discord. Better back and forth than a github comment section.


*itzpr3d4t0r commented at 2022-06-27 07:04:59*

how didn't i think about it? well i resolved it but i can't join now, i'll be very busy for the next 4 hours approx. so if you're still on we can discuss this function overall.


*dr0id commented at 2022-06-27 19:17:03*

Hi, thanks for your efforts.

Did you profile the code? Are these really the slowdowns? `has to do the following for each and every item from the sequence:

assert(itemlength >= 2); , as the items from the sequence can't be "less than" a (Surface, dest) pair, but can be more (line 1959)
Check if (itemlength >= 3) if so -> argrect = PySequence_GetItem(item, 2); (line 1965)
Check if (itemlength >= 4) if so -> special_flags= PySequence_GetItem(item, 3); (line 1969)
Do validity checks on the extracted argrect, set it to NULL at the start of the while loop, manage it and throw errors otherwise. (line 2004)
Do validity checks for the special_flags variable (line 2022)

So to summarize there's a lot of getting, setting and checking for every surface in the sequence. All this with a specialized version of blits() would not be needed, not for each surface at least.`

I see that processing a list of variable length tuples needs lots of checking. Maybe this could be avoided with some default values? If a named tuple supports default values it could be used so the sequence should always be a 4 tuple (all parameters) only needing minimal or no checks at all. This brings me to another idea: have not checks in this method and a separate validation method perhaps? So in 'debug' mode or if something goes wrong the validation method could give more information what is wrong.

I looked it up: named tuples support default values. Thinking about it again maybe an interface would be nicer to extend with other properties/attributes. If there were an interface then the Sprite classes could implement them. Or other classes and it would be clear that this object can be blitted.

Just my thoughts.


*itzpr3d4t0r commented at 2022-06-27 20:03:05*

Hi and thank you for your suggestion. First and foremost, i want to be clear, this is not meant to be an inclusive function, but its purpose is specific and ideally it would be "Just draw the sequence as fast as possible, as simple as possible, without bothering with additional parameters and flexibility". For that i think we're already fine with blits().

Thinking about it again maybe an interface would be nicer to extend with other properties/attributes. If there were an interface then the Sprite classes could implement them

Here you are overthinking the purpose of this function, i intend it as more of a resource that you can use when you want some more efficiency and readability (because smaller tuples means less mess). I plan to remove doreturn for example, but that's not something i'll decide on my own.

The function isn't that complex to see where some gains could be made. What i can only think of doing with my very basic skills, is working to get some speed back while looping through the sequence. That you can do by lowering the number of checks and "GET_ITEM" calls for example, but also optimizing the type of sequence we use aswell. These aren't really the biggest slowdown(the biggest slowdown is of course the blit), infact on my currently uncommitted version of the function, we loop through an array with PySequence_Fast_ITEMS, and we get a 13-14% better perf on average, and it's about the most i can strip down from the function whithout making it unsafe. I didn't profile the code but this was the thought behind what i wrote. What you suggest you could submit to someone who knows a bit more about python C ABI, as i'm a newbie and i may be very wrong without even knowing.


*itzpr3d4t0r commented at 2022-06-27 20:12:47*

If a named tuple supports default values it could be used so the sequence should always be a 4 tuple (all parameters) only needing minimal or no checks at all

Do you mean that we should expect to get a sequence of named tuples of 4 items each? but wouldn't that defeat the purpose anyway? because you'd have to extract all four values from the tuple for each surface and would be the same as blits. Why extract the value None and pygame.BLEND_ADD from N tuples when you can set both once?


*itzpr3d4t0r commented at 2022-06-27 20:17:09*

Generally speaking the idea is getting the fastest looping we can possibly get, with fewer checks and getitem(s) as we possibly can, while waiting for a faster implementation of the blit function. This PR would be helpful to that extent because you'd already have a fast loop to begin with. If the loop is slow and the blit is very fast then it could become a bottleneck(exaggerating here).

itzpr3d4t0r commented 1 year ago

This is resolved, became the fblits function