hrydgard / ppsspp

A PSP emulator for Android, Windows, Mac and Linux, written in C++. Want to contribute? Join us on Discord at https://discord.gg/5NJB6dD or just send pull requests / issues. For discussion use the forums at forums.ppsspp.org.
https://www.ppsspp.org
Other
11.36k stars 2.18k forks source link

Bottleneck on Adreno #2007

Closed jumpertwo closed 10 years ago

jumpertwo commented 11 years ago

Maybe you already know this, anyway testing Dragon Ball Shin Budokai 2 (it's just an example,same behaviour on any games) on my Htc One SV (Krait 1.2 ghz, Adreno 305) turns out i can run PPSSPP at about 38 fps averange.

i have seen myself an Xperia Z (Krait 1.5 ghz, Adreno 320),and other devices with Adreno 220 (and different CPU core) reaching the exactly same fps, and than i've seen a Samsung Galaxy S2 (A9 1.2 ghz, Mali 400 MP4) outperforming everything and running the same game/title at fullspeed.Based on my other tests,Krait is not a quantum leap from A9 (it is on paper),but in the worst scenario it should be at least on par with it expecially if they are running at same frequency,than what remains to be blamed is gpu driver i think.According to Adreno Profiler my gpu is only 25-30% busy, same goes with clock.

adreno

i already tried to disable alpha/color tests, everything to make it draw and calculate less, but seems uneffective, conferming that the hardware is pretty fast.how can different platform solutions have same performance if one of them is supposed to be the successor? if someone got a Nexus 4, situation changes with 4.2.2? ( it's supposed to be packed with new Adreno driver)

solarmystic commented 11 years ago

Drivers. It all boils down to the drivers, especially on mobile devices.

Conversely, what is the CPU core usage on the mobile device when running the game?

It could easily be a case of the CPU core bottlenecking your performance, remember that PPSSPP is only using one core for all devices (including the PC) at the moment, and a bottlenecked CPU may not be passing frames fast enough to the GPU to process, which means the GPU ends up not being fully utilzed and you end up with mediocre performance.

jumpertwo commented 11 years ago

Usage between 30-35% according to Cool Tool.not CPU bound i guess.sorry,maybe my post above was a bit misleading about Krait.it's not slow and definitely not a bad processor either (it's just a bit slower on NEON ops, that are not even implemented yet)

solarmystic commented 11 years ago

Okay then, fair enough on that point.

Have you tried OCing the components to check for any improvements? It may not help any though, if it really is up to the drivers in the end.

jumpertwo commented 11 years ago

Overclocking will not help if it's not even reaching its current max (stock) frequency.

nekotipcat commented 11 years ago

Krait is better then A9 so it's probably the drivers, Krait should be between A9 and A15 with Apple's Swift Architecture http://en.wikipedia.org/wiki/Krait_(CPU)

xsacha commented 11 years ago

It's not the CPU (Krait). GPU isn't being used properly.

jumpertwo commented 11 years ago

Yeah,that's the point.if possible,i just want to know from someone with a Qualcomm device on Android 4.2.2 (and newest driver released on March-April) if there are any improvements.Nexus 4 should have this updated i guess

nekotipcat commented 11 years ago

@jumpertwo http://www.youtube.com/watch?v=CDlI6n7fFD8 here's one with htc one using krait 300 and adreno 320 with 45-63 vps,soc Qualcomm Snapdragon 600 APQ8064 http://www.youtube.com/watch?v=4LJUuRt0APk Nexus 4this one didn't have vps turned on, but it looks faster then 38 vps soc Qualcomm Snapdragon S4 Pro APQ8064 @xsacha I know but in the first post "Krait is not a quantum leap from A9 " I just wanted to clarify that the krait architecture is faster then A9's architecture

xsacha commented 11 years ago

Yeah I have an Adreno 225 and get better performance than that. But I am using Blackberry, not Android. Driver version is 4.0 (the oldest of the new drivers). That's same one HTC One uses.

jumpertwo commented 11 years ago

I've seen that first video,this is why i suppose newer driver can solve this bad behaviour.about the second video,it's not the same game (not very resource demanding). http://www.youtube.com/watch?v=_1zxhzauTRc Xperia Z (Krait and Adreno 320) difference of 20 fps from the first Htc One video (0:49)

nekotipcat commented 11 years ago

is there any phone who can run at 60 vps and what android version do you have?` Android have never been very fast, lagging on a dual core 1.5 ghz cpu when in menus, compared to Apple's iPhone 4S with 2 A9 800 Mhz cores no lag.how does ppsspp run on iOS? I'm just thinking with the drivers, perhaps there is some differences between OS's

jumpertwo commented 11 years ago

@xsacha how did you find out driver version/revision? on Qualcomm site, it labels only Android rom version (eg. [JB_VANILLA.04.02.02.060.053])

@nekotipcat iOS and Android, are very different system.it's not use comparing them basing on (written differently) menu scrolling speed or in my opinion neither benchmark,i think a system performance comes from synergy not integer raw power

xsacha commented 11 years ago

Driver version is based on GL_VERSION string. Sonic told me about it. Version numbers are a bit weird but basically Samsung Galaxy S4 is using the latest drivers (version 14.0?) and hence get the best performance.

jumpertwo commented 11 years ago

glGetString(GL_VERSION) returns OpenGL ES 2.0 am i missing something?

unknownbrackets commented 11 years ago

Probably want GL_RENDERER, I think. Adreno Profiler seems interesting...

-[Unknown]

jumpertwo commented 11 years ago

@unknownbrackets GL_RENDERER gives Adreno (TM) 305. Weird,on Linux GL_VERSION does the job.

how much can (bad) drivers restrain hardware on doing its work and not even making it rev up to max frequency? something smeels fishy. would be cool to test newer revision

unknownbrackets commented 11 years ago

Well, as far as I know, the Android drivers are generally quite bad and can indeed have a significant impact on performance in this way. But even so, perhaps there's some workaround...

-[Unknown]

jumpertwo commented 11 years ago

@unknownbrackets at least GL_VERSION worked right (it was my fault).

i got driver v@4.1 ,that it's pretty old now but according to GFXbenchmark Xperia Z is bundled with v@6, and it suffers the same way. if my Adreno Profiler comes back to life (stopped working after an automatic Windows Update,that i already tried to revert with no luck) i'll try to see if it doesn't digest something right (unreleated, but i've also seen some shader compilation errors)

xsacha commented 11 years ago

Yeah you want v@14 which galaxy s4 has.Is there a way to replace your drivers on android?

jumpertwo commented 11 years ago

@xsacha yeah, i could flash them but v@14 needs Android Jelly Bean 4.2.2 . My device is gonna be updated to 4.2.2 soon (according to LLabTooFeR, HTC's rom developer) but many other devices (with Adreno 2xx/3xx series) are not so lucky. This is indeed a bottleneck, and i think it would be cool to investigate what the problem can possibly be

unknownbrackets commented 11 years ago

So, you tried killing all the logic in the shaders already, right? You probably also tried the vertex cache and VBO options?

Another thing to try is sending less data to the GPU. For example, in TextureCache.cpp, there's bool match = ...; etc. If you set that to true and in the if a bit lower make sure it never rehashes, it should not upload textures very often. You can see if that helps.

Another thing you can try is not setting things like the blend func, e.g. many of the things in StateMapping.cpp. You could also try sending less uniform values in ShaderManager.cpp.

Is it possible the time is all being spent on memory bandwidth?

-[Unknown]

jumpertwo commented 11 years ago

@unknownbrackets I tried the changes in TextureCache,disabled blend,alpha,color,etc..i destroyed all type of fancy drawings.it's not use,nothing changed,not even 1 fps gained. I'll try playing with shaders when i came back home tonight.

jumpertwo commented 11 years ago

@unknownbrackets Sorry for my late reply. seems like GPU is spending 70% of "shading time", shading fragments (i mean,compared to the time spent shading everything),so i tried reducing this as much as possible (around 40%) as you've suggested, killing fragments logic one after another. Nothing changes,no gain.GPU metric, shows very low vertex/texture fetch stall,so i think this means shaders can get texture/vertex data fast enough..

hrydgard commented 11 years ago

The only thing that's likely to be really expensive in our fragment shaders are alpha/color test - the stuff with "discard". If killing that does nothing, it's clear that the bottleneck is elsewhere.

unknownbrackets commented 11 years ago

Also, if it's spending 70% (that actually seems low to me but I don't know anything really...) of shading time on fragments, but it's not generally busy anyway, I think it's probably somewhere else. 70% of "not busy" isn't the problem. There's gotta be some interlock or latency somewhere...

-[Unknown]

jumpertwo commented 11 years ago

after paying more attention, seems like i can reach relative high fps for less than a second (more like an instant,but definitely visible in graphs), exactly when frequency goes up. actually, running PPSSPP while in power saving mode (cpu and gpu gets capped) gives no difference in fps/vps (it doesn't rev up to even 200 mhz while in normal mode,always averange 180-190) , so it will never reach the saving mode cap. According to various documents on Internet, Adreno 305 can go up to 400 mhz (i don't know if it's the higher stage) http://kyokojap.myweb.hinet.net/gpu_gflops/

having same freq (and performance) in normal and saving mode doesn't sound right.which payload triggers gpu clocks on mobile? maybe it can help us finding the problem

unknownbrackets commented 11 years ago

And just to be sure, this all happens even if you increase the "fixed" fps speed and switch to it, right? It's not like it's sleeping.

I wonder if something we're doing is signalling the driver to stay in low end mode or something... seems strange.

-[Unknown]

jumpertwo commented 11 years ago

@unknownbrackets Yeah,it happens. Just tried now to make sure

jumpertwo commented 11 years ago

after spending some time reading all type of Adreno dev docs i didn't find anything related to power management features of the chipset or how things works on a low level.

I think Adreno's driver (v@4.1, v@6, v@14 ,so pretty much all versions, affecting at least Adreno 220, Adreno 305, Adreno 320 basing on my tests) misjudge how much resources PPSSPP needs and keep flying low to save energy.i don't know if it's possible to change gpu's governor even without root (don't think so) but there must be a way to make it know PPSSPP needs more focus and more power, similar to a video player application

xsacha commented 11 years ago

Maybe it is judging power saving mode based on what CPU usage is. As you say, the game runs fine when underclocked, so it likely thinks it should stick with that. Then I guess the CPU and GPU clocks are linked.

If you have root, you can run one of those overclocking apps and lock the CPU speed. Use performance governor. That will test the theory.

People have experienced a similar issue on Blackberry where some apps require a Youtube video to be run in the background in order to get maximum speed. I'll see if that affects games on my phone. Edit: Nope that's fixed on Blackberry now.

Troedellahm commented 11 years ago

The could be a bug in the Adreno Profiler. Today when I tested v2.4 it displayed a 100% GPU usage. After updating to 3.3 it only displayed something about 30% usage. However I have a S2 so maybe thats something completely different :)

EDIT: Xperia Play - but yeah you're right Adreno205 dosen't scale ... nvmd me :p

jumpertwo commented 11 years ago

@Troedellahm you got a Tmobile Galaxy S2 with Adreno 220? anyway busy/usage metric is not that important , as it is calcutated by clock speed.what is more suspicious is the no-differences between normal and power saving mode, with frequency capped at 190 mhz. i've seen it up to 400-410 mhz while running GFXbenchmark

this doesn't affect Adreno 205, as it doesn't scale its clock speed (just tested a moment ago)

jumpertwo commented 11 years ago

@xsacha unfortunately i don't have superuser on this device for now. (Htc devices are pretty stubborn about this, and One Sv is not that polular on the scene)

will probably flash it after 4.2.2 update (if my recovery will still work on that), so i can't test it for now.

still there are many Adreno (225/305/320) users who don't want to root their phone or don't even know what this means..maybe something can be done to find an happy-medium solution.

Edit : seems like GPU and CPU are not linked in qualcomm's throttling.Running on interpreter (much more cpu load,game's fps 10-11) GPU's clock,busy,stalls,everything metric stays the same..must be something else.

another test : with Linear filtering enabled, Antialiasing (x2 resolution), Anisotropic filter x16 and Texture scaling i can finally get usage/busy at 90%, but still clock is literally capped at 185-190 mhz

xsacha commented 11 years ago

Is there a way to check what my Adreno GPU clock is? I suppose Adreno Profiler doesn't work on Blackberry. Is there a GL extension for grabbing GL metrics (like for Broadcom)?

Performance on my device seems in-line with the performance of Adreno 225 so I would be surprised if it could go any faster.

jumpertwo commented 11 years ago

@xsacha As far as i know Adreno Profiler is ready also for BB10, but it is up to Blackberry to actually deploy it (this is what internet says).Hmm..don't know of such extension.

Searching on the net, i found people of mupen64plus(AE) having similar problem with Adreno 3xx devices : weird low fps and low gpu usage. they also theorized strange gpu throttling,driver bugs,etc..but ended up fixing things after disabled eglWaitNative and eglWait calls http://www.paulscode.com/forum/index.php?PHPSESSID=ea1d3ae465f2c2156a74e74abffe0b6b&topic=1024.0 PPSSPP don't use them right? maybe there's something similar in java methods/ frontend?

thinking of it,maybe it's not "Adrenos only" releated. i mean, maybe there is some latency somewhere in the code (Android releated), and it ends show up on Adreno 22x/3xx series only, because they are the only GPUs that actually throttles their clock frequency on-demand. clock locked gpus like Mali400 (fixed at 270 or 400 mhz), maybe hide this problem (and partially solve it).

ThreeHT commented 11 years ago

4412 so fast, 8064 really slow - -!

jumpertwo commented 11 years ago

Sorry i don't understand,what do you mean by 4412 and 8064?

Edit : Probably Exynos 4412 and Snapdragon A8064?

Nezarn commented 11 years ago

On my galaxy mini 2 (adreno 200) gpu is at ~45% in project diva 2nd demo (in menu, ~30-35fps, tested with adreno profiler 3.3)

ThreeHT commented 11 years ago

@jumpertwo You're right!The same 720P, Exynos 4412 lot faster!About 50%! note2 as exceed iphone5,although not many!About 10%! Unscientific ah! I'm sorry for my English is very bad!

cv47 commented 11 years ago

GPU: Adreno320, Testing Game: Gundam Vs Gundam

menu and Loading = 478MHz 30% loading Gaming = <30MHz 100% loading

adreno aaa

jumpertwo commented 11 years ago

@cv47 Which device are you using? so Adreno 320 runs at only ~ 30 mhz while in-game?

ThreeHT commented 11 years ago

@jumpertwo cv47 phone is LG Optimus G

cv47 commented 11 years ago

older Adreno GPU slow down on over 1000 draw call.

Render the object in a single draw call will be better?

unknownbrackets commented 11 years ago

Does anything like this help? https://github.com/unknownbrackets/native/compare/qcom-alphatest https://github.com/unknownbrackets/ppsspp/compare/qcom-alphatest

As far as draw calls - this is usually when the game flips state around too much. We have to send things as a draw call whenever the state of that call changes. In some cases, we do it more often than strictly necessary when the game flips the state on and back off before actually rendering more prims.

-[Unknown]

cv47 commented 11 years ago

@unknownbrackets qcom-alphatest extension is working, it fix alpha issue on Super Robot Wars Z...

Before: https://f.cloud.github.com/assets/4585760/657906/6cc6913a-d59c-11e2-8482-c547b3cc3007.jpg

hrydgard commented 11 years ago

We try to combine draw calls already but it looks like we are indeed failing in that game, since there are lots of glDrawElements with nothing in between. My guess would be something like what Unknown says - it changes some state, then changes it back between each draw call. Can possibly be detected and avoided, which should give us a large speed boost in that specific game.

cv47 commented 11 years ago

@hrydgard @unknownbrackets How to do "Draw Call Batching"? or any example tutorial? I would like to determine where a bottleneck on Adreno GPU :(

jumpertwo commented 11 years ago

qcom-alphatest can't help us for speed, we've already determined the bottleneck wasn't in the fragment, anyway it fix some good cases where the previous test where failing for whatever reason

@cv47 this is a tricky one. as far as i know , Adreno struggles even when draw calls aren't so many (see my first post about Dragonball game)

dbz400 commented 11 years ago

@unknownbrackets , we tested the qcom-alphatest extension and seems to be solid .May be we can have a merge for it .

jumpertwo commented 11 years ago

after reading http://forums.ppsspp.org/showthread.php?tid=7069 and https://plus.google.com/+nialldouglas/posts/YgbUb4mUXP2 on the forum, i guess we can try using Snapdragon LLVM 3.3 compiler offered by Qualcomm,see if it helps a little and stop blaming a possible Adreno's driver bug. it has various optimizations for Krait and Scoprion cores ( i have used it not too long ago, for some not very demanding apps, so let's say i didn't really test it) https://developer.qualcomm.com/mobile-development/performance-tools/snapdragon-llvm-compiler-android