Slow fps on haxe target

ghost commented 7 years ago

i create for working with this problem. For many polygons in mesh (spine) we have 2-4 fps on iPad Air2.

soywiz commented 7 years ago

Since i cannot see your code, i need more information. Please create a test here that works slow in haxe: https://github.com/jtransc/jtransc/blob/master/benchmark/src/Benchmark.java

That way i can workon improving the performance. Or atleast point which file/class/method is slow in the project.

soywiz commented 7 years ago

Sorry didn't notice that you were self-assigning this :)

ghost commented 7 years ago

BTW: I make many test on different devices. First message about debug only, for release without fix speed x10 (about 40fps). We already in test find problems and increase release on 5% (debug on 100% !!). Need more time maybe we can increase on 10% release, and then i create PR because now our fix will be broke Haxe tests on jtransc.

soywiz commented 7 years ago

Cool. Let me know if I can do something about this. Also if you can tell me what do you changed or how are you using it, maybe I can come out with something to make it compatible with current JTransc or adapt JTransc for it.

Also haxe debug is usually pretty slow as far as I remember. It adds a lot of boilerplate for debugging + backtraces. I added @:noStack for better performance on critical-parts

ghost commented 7 years ago

patch.zip You can see this patch. We use profiler and try find problems, after fix profiler find next problem, but this we not see more. Idea kill all copy in arrays and vectors, and work with pointers to memory. Big problem polygons in mesh. Now we see problem with Matrix, but it not ready. And last problem LimeFiles, we have lazy read files on open new window in game, and this get critical lag, some method remove arrays and work with stream from haxe. P.S. This path tested on spine-demo without limit fps, from 700 up to 750 on tested device (debug 300 up to 700). This very simple animation, we have many animations on one screen and get better result (4 up to 12 in debug).

soywiz commented 7 years ago

Ok. One thing I see here is that I should optimize Haxe's System.arraycopy.

This looks slow:

class JA_B {
{{ HAXE_METHOD_ANNOTATIONS }}
static public function copy(from:JA_B, to:JA_B, fromPos:Int, toPos:Int, length:Int) {
    if (from == to && toPos > fromPos) {
        var n = length;
        while (--n >= 0) to.set(toPos + n, from.get(fromPos + n));
    } else {
        for (n in 0 ... length) to.set(toPos + n, from.get(fromPos + n));
    }
}

{{ HAXE_METHOD_ANNOTATIONS }} override public function copyTo(srcPos: Int, dst: JA_0, dstPos: Int, length: Int) { copy(this, cast(dst, JA_B), srcPos, dstPos, length); }
}

In C++ we have a pointer so we can convert it to a plain memcpy. Which should be much faster.

The other thing I see is that we have to avoid copying if possible, for converting Buffer to Lime typed arrays with things like fastConvertIntBuffer. I have to check if lime allows a way to create typed buffers from pointers in C++ or things like that. I will investigate a bit after the width+height issue for splash images.

Also we can try things like specialized direct buffers (which are required in libgdx) that end using typed arrays while allowing the rest of the application to use better types for arrays.

The problem with using typed arrays here is that prevented normal arrays to be as fast as possible in Haxe target that's why I changed it ( I used the benchmark to ensure this was true ). If I cannot find a way to make your use-case faster while keeping it, I will accept using typed arrays again for arrays so you can advance with this.

soywiz commented 7 years ago

For the record. I have updated benchmark ( https://github.com/jtransc/jtransc-benchmark ) to test several arraycopy types. These are the results on my machine:

WIN-JAVA 1.8:

arraycopy byte...8.964731991291046
arraycopy short...25.391018986701965
arraycopy char...25.71083700656891
arraycopy int...28.533034026622772
arraycopy float...27.600202977657318

/////////////////////////////////

WIN-JTRANSC-CPP 0.6.4:

arraycopy byte...1.0
arraycopy short...3.0
arraycopy char...3.0
arraycopy int...9.0
arraycopy float...10.0

/////////////////////////////////

WIN-JTRANSC-HAXE-CPP 0.6.4:

arraycopy byte...107.00048828125
arraycopy short...87.999267578125
arraycopy char...85.0009765625
arraycopy int...10.000244140625
arraycopy float...89.000732421875

So except for int, haxe-cpp arraycopy is pretty slow, specially comparing with jtransc-pure-cpp which is blazing fast. So I will work on this as a first step for this issue.

ghost commented 7 years ago

Update: After #189 we have good results. I will be use spine demo on windows, but all platforms have same result in percent. before 300 fps, after 435 fps, old my result will be 600-700 fps. I think after 8 hours i make new branch with new code. Need finish with current bugs, and i move to brunches from my fork.

ghost commented 7 years ago

I finish tests, my code no new changes in real projects, bust spine-demo only. But this very simple project, close it.

jtransc / gdx-backend-jtransc

Slow fps on haxe target #59