hrydgard / ppsspp

A PSP emulator for Android, Windows, Mac and Linux, written in C++. Want to contribute? Join us on Discord at https://discord.gg/5NJB6dD or just send pull requests / issues. For discussion use the forums at forums.ppsspp.org.
https://www.ppsspp.org
Other
11.36k stars 2.18k forks source link

Android : Extreme slow on battle Scene in Gundam vs Gundam Next Plus #1319

Closed dbz400 closed 6 years ago

dbz400 commented 11 years ago

I think most of the android users may come across this game which the battle scene is extremely slow , always under 10-15fps without frameskipping .However , iOS is different especially iPhone 5 ,it almost reach 50-60 fps .

Just wonder what is the bottleneck ? Understood GPU is different but not expecting significant performance difference... .

My device is Sony Xperia Z running Snapdragon S4 Pro , 1.5G CPU and 400Mhz GPU

Screenshot_2013-04-18-11-43-49

ketkul commented 11 years ago

I guess ame thing happen with Full Metal Alchemist Brotherhood...

dbz400 commented 11 years ago

@ketkul , what is the FPS then ? Different story in iOS ?

BTW , for GvGnp , it seems it used lots of time in sceGeEnqueue and 10 times draw call when compared to other games

Screenshot_2013-04-18-14-25-53

For other games , DL processing time is around 6-10 ms Screenshot_2013-04-18-14-35-45

hrydgard commented 11 years ago

It looks like the issue might be the very large number of drawcalls, that game does 2200 flushes (actual OpenGL drawcalls, we combine PSP drawcalls into bigger ones) while dragon ball below does 110. The iphone now seems to have drivers that are reasonably good at doing lots of drawcalls.

It might be possible to improve this but it might also be very difficult, it depends.

dbz400 commented 11 years ago

I see. The GvGNP battle scenes is pretty complex and there are few other robot moving around when compared with DBZ which is quite static in background .

Wondering what optimization has been done for the iphone PowerVR driver ...

hrydgard commented 11 years ago

I obtained a frame log dump, and here's the explanation for the massive number of drawcalls, causing slowness on devices that don't like that:

http://pastebin.com/wwN8KrDe

Basically, the game flips the culling direction back and forth almost between each triangle strip. This causes us to do lots of tiny draw calls, as we can't combine the strips into a big draw call currently as we normally do.

But, as we do translate these to indexed triangle lists anyway, we could take the culling direction into account when doing that, and then we could skip flushing the draw list between. This would be a bit complicated though and I don't want to do it until I find more games that use the same weird drawing technique...

unknownbrackets commented 10 years ago

If I remember right, hashing really slows this down.

In GPU/GLES/TransformPipeline.cpp:

        // Cannot cache vertex data with morph enabled.
        bool useCache = g_Config.bVertexCache && !(lastVType_ & GE_VTYPE_MORPHCOUNT_MASK);
        // Also avoid caching when software skinning.
        if (g_Config.bSoftwareSkinning && (lastVType_ & GE_VTYPE_WEIGHT_MASK))
            useCache = false;

What if you add after that:

        // We don't cache with this few verts, so skip hashing entirely.
        if (vertexCountInDrawCalls < 100)
            useCache = false;

Does this improve, or hurt performance? Also, how is performance nowadays with the vertex decoder jit?

-[Unknown]

dbz400 commented 10 years ago

Let me try this out now.

dbz400 commented 10 years ago

Humm if vertexCountInDrawCalls < 100 , little bit hurt performance .Vertex Decoder JIT is pretty helping in improvemance

dbz400 commented 10 years ago

Best speed can be obtained if texture coord speedhack is used as well as multithreading .

unknownbrackets commented 10 years ago

So, how does this run on e.g. a Sony Xperia Z now?

If the performance hit is no longer massive, it may not make sense to try to optimize the culling thing (especially since such optimizations could negatively impact games that use a lot of vertices and don't swap culling, potentially.)

-[unknown]

dbz400 commented 10 years ago

I'll test it out again .

unknownbrackets commented 10 years ago

Does #6855 improve this at all?

-[Unknown]

ghost commented 10 years ago

I transferred the program to the section of system applications. Granted, the right for uses of changes of settings of the device. The sound doesn't work (can't find library). I don't understand why you don't use overclocks processor for users of a root and cycle swimming m_ram screenshot_2014-09-13-10-46-36 "SPECIFICS": { "gt-i9220|gt-n7000|gt-i9260|sc-02d|transformer prime tf201": {
"GPU": "1" }, "lenovo k900_row": { "GPU": "2", "CPU": "1", "RES": "5"
}, "htc butterfly|htl21|htc6435lvw": { "GPU": "2", "CPU": "0", "RES": "5", "MEM": "0" }, "nexus 4": { "RES": "6" }, "nexus 10": { "RES": "4" }, "at10le-a": { "RES": "4" }, "sm-p600": { "RES": "4" }, "sm-g900h": { "MEM": "2", "RES": "9" }, "sm-n900": { "RES": "8" }, "redhookbay|starxtrem": { "GPU": "2", "CPU": "0.5" }, "byt_t_ffrd10": { "GPU": "2", "CPU": "0.5" }, "thinkpad tablet": { "RES": "2" }, "isw16sh|sht21|lt22i|shv-e160s|lg-f160s|im-a840s": { "RES": "0", "GPU": "1" }, "htc sensation z710e|htc sensation 4g|xt910|so-02d|shv-e120s|im-a760s|lt26i|gt-i9100|im-a800s|shw-m380s|adr6425lvw|a500": { "GPU": "0", "MEM": "0", "RES": "0", "CPU": "0" }, "lt28h": { "reduceDepthFighting": true, "GPU": "0", "MEM": "0", "RES": "0", "CPU": "0" }, "gt-p5100": { "reduceDepthFighting": true, "GPU": "0", "MEM": "1", "RES": "3", "CPU": "0" }, "adr6400l": { "GPU": "0", "MEM": "0", "RES": "3", "CPU": "0" }, "b1-a71": { "GPU": "0", "MEM": "0", "RES": "0", "CPU": "0" }, "lg-d685": { "GPU": "0", "MEM": "0", "RES": "0", "CPU": "0" }, "me172v": { "GPU": "0", "MEM": "0", "RES": "3", "CPU": "0" }, "me173x": { "GPU": "2", "MEM": "1", "RES": "6", "CPU": "1" }, "sc-05d": { "GPU": "1", "MEM": "1", "RES": "0", "CPU": "1" }, "sch-i535": { "MEM": "2" }, "sol21": { "GPU": "2", "MEM": "2", "RES": "0", "CPU": "1" }, "mediapad 10 fhd": { "GPU": "1", "MEM": "2", "RES": "4", "CPU": "1" }, "gt-p3200|gt-p3210|sm-t210|sm-t210r|sm-t211|sm-t2105|gt-p5210|gt-p5200|sm-t311|gt-n5100|gt-n5110|sgh-i467m|sm-t310|sm-t312": { "reduceDepthFighting": true, "GPU": "1", "MEM": "1", "RES": "0", "CPU": "0" }, "hudl ht7s3": { "GPU": "3", "MEM": "1", "RES": "0", "CPU": "1" }, "gt-p7500": { "GPU": "0", "MEM": "1", "RES": "3", "CPU": "0.5" }, "gt-p5110": { "GPU": "0", "MEM": "0", "RES": "3", "CPU": "0" }, "samsung-sgh-i957": { "GPU": "1", "MEM": "0", "RES": "0", "CPU": "0.5" }, "gt-p1000|gt-p1010|gt-i9000|n861|lg-p920|lg-p990|htc desire|st15i|droidx|walkman|st25i": { "textureBudgetMB": 85, "useTextureStreaming": false, "GPU": "0", "MEM": "0", "RES": "3", "CPU": "0" screenshot_2014-09-13-11-08-38

Bigpet commented 10 years ago

@QWEmct overclocking the host system is something the OS should do, it's not the responsibility of the emulator to tune your hardware. As for the property you underlined, it's for taking screenshots only. I don't think it's worth investigating whether taking screenshots can be made faster.

ghost commented 10 years ago

@Bigpet , If to disconnect it, sepia on all device,at the time of screen turn Independent, from the program for screenshots. screenshot_2014-09-13-14-54-16 The screenshot doesn't play a role here. ro.bq.gpu_to_cpu_unsupported=0


To you all the same, it is necessary to fasten the cpu and gpu functions. They don't work at the android.

Bigpet commented 10 years ago

Well it's general OS readback for either screenshots or blitting back onto the screen. Again though, if it's causing problems then maybe you should actually tell us which problems those are instead of just posting an image with something underlined.

ghost commented 10 years ago

@Bigpet ,this offer. What to fix individually, settings for the device. Sentence to consider system settings of the device ( ~ android.permission.MODIFY_AUDIO_SETTINGS..). And also to consider each of them.


I can create the list of errors if you don't know them. 1) Can't wake up ( If I deliver the program on a pause) also I launch following the program. 2) The GPU and CPU functions, don't work (Readings from buffers, it is impossible). 3) The multithreading doesn't work. 4) There is no vibration, in case of entrance on a roadside. screenshot_2014-09-12-22-33-41

5) Transport block etc....

rainercedric23 commented 9 years ago

This issue is still replicable on the latest ppsspp version using htc one m8, Quad-core 2.3 GHz Snapdragon 801, Adreno 330 with 450 mhz, 32 pipelines.. sad. still running on 5-15 fps on battleplay.

hrydgard commented 6 years ago

With the new "inner interpreter" loop, it would be relatively easy to special-case and fix this. I'll leave that for 1.7.0 though.

unknownbrackets commented 6 years ago

The special casing has been done in #10973. Combined with the performance improvements in 1.6.x, it's likely this game will perform a lot better.

I'm going to close - if it's still performing badly in the latest git builds, please feel free to comment what the difference in speed from 1.6.3 to latest git and your settings. It might be a new issue, though, depending on what year it is when you're reading this comment.

-[Unknown]

ghost commented 6 years ago

Gundam VS Gundam Next Plus now running smooth even in Mali-450 Octacore 1.4GHz with frameskipping off set & emulated cpu clockspeed set to 60 or 70. 25-30 FPS 90-100 Speed using the latest ppsspp buildbot.

ghost commented 6 years ago

https://youtu.be/ZVOAI_fhQeg just a proof what I said 😃