libretro / gpsp

gpSP for libretro.
GNU General Public License v2.0
52 stars 52 forks source link

[PS Vita] Castlevania Aria Sorrow slowdown #236

Open Erken22 opened 9 months ago

Erken22 commented 9 months ago

Castlevania aria sorrow slows down with sound lags at the beginning of the game, when it wakes up and a special blur effect appears on the screen, in early builds I don’t remember exactly which ones, like from August to October I didn’t play on the ps vita 2000. I don’t notice any slowdowns. I'm currently playing on the ps vita1000 fat model and I'm noticing slowdowns when the blurry squares effect appears at the beginning

andymcca commented 9 months ago

This will probably be the Mosaic effect which was implemented within the last year - not sure if it should be having that much of a performance impact, maybe if it needs a lot of RAM tho and cache flushing/swapping might be happening?

Apaczer commented 9 months ago

@andymcca I've also noticed big hit with mosaic effect, for e.g. Super Mario Advance after game creation (https://youtu.be/pbSWJCzmCNI?feature=shared&t=125) almost 15fps drop. This is of course for low-end Miyoo arm32 platform with 32MB of DDR1.

Any chance to optionize new video driver or mosaic effect? I believe most folks would vote for speed over accuracy if it goes for gpSP.

Erken22 commented 9 months ago

It is interesting that the mosaic effect is also implemented in vba next, but there is no slowdown during the mosaic effects

andymcca commented 1 month ago

I have an idea for speeding up the current per-pixel implementation of the mosaic effect. Will try it out in the coming days and update here

@Apaczer I've almost certainly used up all my free time bonus for a while just with this weekend's activity here!!! So may be a while before I get to this, but having looked into it I think the reason for the slowdowns could be the liberal use of the Modulus operator (%) in the Mosaic code, and at a pixel-by-pixel level in some cases. This will likely be slow on some platforms such as older ARM especially as we're not dividing by a constant (the compiler would likely optimise this to a shift instead). So I'm looking at how to substitute these calls either with faster equivalent operations e.g. Bitwise and/or optimising the render functions in general (can probably loop repetitive pixels and maybe even cache object/bg rows if appropriate). When I make the changes and do a PR, I'll hit you up so you can test on your ARMv5 device if that's ok?