haxenme / nme

A cross-platform native backend for Haxe projects
MIT License
480 stars 123 forks source link

"08 - Filters" sample on android #331

Closed codeservice closed 6 years ago

codeservice commented 8 years ago

I checked Filters sample on android. For some reason GlowFilter extremely slowing down system. DropShadowFilter works ok. On Win, Mac, iOS I dont see this problem.

hughsando commented 8 years ago

Strange just one filter, one one platform. It might be some bad bounds calculation and is doing too much work.

On Sat, Jun 25, 2016 at 10:52 AM, codeservice notifications@github.com wrote:

I checked Filters sample on android. For some reason GlowFilter extremely slowing down system. DropShadowFilter works ok. On Win, Mac, iOS I dont see this problem.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haxenme/nme/issues/331, or mute the thread https://github.com/notifications/unsubscribe/ABlp1pD8fuCFSEfU3WL-vVvoPkhETnyqks5qPJfwgaJpZM4I-P8I .

hughsando commented 8 years ago

I notice the size of the filter is depends on the absolute mouse position. For a very HDPI display, this might add up pretty quickly.

codeservice commented 8 years ago

It works well on retina old ios tablet (3-rd gen) and slows immediately on Nexus 7 as soon as I touch screen. Something different on android.

hughsando commented 8 years ago

I wonder if it is using software floating point fpu/armv5. I've got some time off, and will have to look at it later.

codeservice commented 8 years ago

Thanks!. My concern if this issue specific for this filter only or used in common public method for other stuff.

On Tue, Jun 28, 2016 at 11:45 AM, Hugh Sanderson notifications@github.com wrote:

I wonder if it is using software floating point fpu/armv5. I've got some time off, and will have to look at it later.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/haxenme/nme/issues/331#issuecomment-229091010, or mute the thread https://github.com/notifications/unsubscribe/ABsIG8YGWPf2Ao7Vj_12rmoJEcEag2H4ks5qQUGKgaJpZM4I-P8I .

codeservice commented 8 years ago

Recompiled sample project with <arch value="armv7" if="android"/> instead of default armv5. Result the same. Doesn't look like problem with FPU.

I can see correlation between circle radius and rendering speed. What strange if radius bigger 30 ..32 pixels, speed slows down dramatically. maybe some kind of power of 2 related issue. Out of ideas now ...

thomasuster commented 6 years ago

@codeservice is this still an issue?

codeservice commented 6 years ago

Yes, filter very slow only on Android, I dont know why. Maybe its hardware specific issue. If someone can try Filters sample on different device and confirm this issue.

thomasuster commented 6 years ago

Reproduced on my Nexus 5, does seem quite a bit slower. Also did an addChild(new FPS()) to see an fps counter. I'll try to dig in more.

codeservice commented 6 years ago

On nexus 7 when this filter works, CPU goes up to 100% and app become unresponsive.

thomasuster commented 6 years ago

I think it's a problem in nme/project/src/common/Display.cpp or lower. Still narrowing down...

Like if you replace const FilterList &filters = obj->getFilters(); with an empty FilterList there doesn't seem to be problems.

codeservice commented 6 years ago

I am not sure how code can process filters if this line return empty list.

Maybe this empty list force Power of 2 operation and it works well in this case: if (inState.mRoundSizeToPOW2 && filters.size()==0) { w = UpToPower2(w); h = UpToPower2(h); }

thomasuster commented 6 years ago

@codeservice Oh no, to clarify I just meant there's some extra heap allocations that are made in haxe side cloning the filters passed in instead of just using them directly. Also instead of updating the filters used when moving the mouse brand new filters are set every single frame. I thought perhaps problems were there, but it does seem to be in processing the filters themselves, not in those places I hoped it would be. Next step would be to see which part of processing the filters is especially slow.

codeservice commented 6 years ago

I saw many “for” inside “for” cycles processing this filter. This logic slow by defenition, as bigger object size as longer cycle time. Question only, why this different on android. Maybe it’s result of power of 2 which makes object size even bigger only on this platform and not on others. You can try make circles radius twice smaller and it will be reasonably fast on Android too. Wandering how this works on OpenFL, is it using same logic as NME and has the same issue?

thomasuster commented 6 years ago

Hey @codeservice ! I made an exciting discovery, would you join me in gitter? https://gitter.im/haxenme/nme

codeservice commented 6 years ago

I am in glitter

thomasuster commented 6 years ago

This fixes the issue... https://github.com/thomasuster/NME/commit/b934cc4b1a8df32758626bd970d5c0d2e931fed3

Although that should not be the fix. I will need C++ expertise help on how we can get compiler to do that for us. Or some other optimization.

codeservice commented 6 years ago

Maybe it should be:

If(inFilterSize<=255) *dest = sa/inFilterSize;

Your switch/case code simply limit to 255 filter size.

On Oct 4, 2018, at 23:46, Thomas Uster notifications@github.com wrote:

This fixes the issue... thomasuster@b934cc4

Although that should not be the fix. I will need C++ expertise help on how we can get compiler to do that for us. Or some other optimization.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

thomasuster commented 6 years ago

So the reason why I limit to 255 in case statement because if you look at the params passed in it's either blurX or blurY, which is bounded.

mBlurX = std::max(0, std::min(256, inBlurX-1) );
mBlurY = std::max(0, std::min(256, inBlurY-1) );
thomasuster commented 6 years ago

@hughsando So it cannot be divide by zero problem. Could it be some compiler optimization flag we're missing from the android toolchain? @madrazo any ideas?

hughsando commented 6 years ago

So maybe there are a few things going on here.

  1. optimization flags may not be great - android might still have "armv5" if HXCPP_ARMV7 is not defined. This will also use software floating-point. But I think this is an integer problem, not floating point.
  2. Texture uploads. Changing filters requires you to change textures, which then must be uploaded. This can be an expensive operation on certain devices, and this might be what we are seeing here. But this should be constant no matter what the loop count, so we should be able to work out if this is the issue.
  3. power of 2 is generally not required these days, unless you are doing repeating textures. src/opengl/OGLTexture.cpp does a check to see if is really required.
  4. It might be possible to use hardware filters, but this has its own problems with render-target-switching taking way too long on android for phone with less specs than the samsung 6.

Generally, it is be to avoid filters for objects that change every frame. There are a few techniques to help here.

  1. Explicitly cache the filter in a bitmap and just use a bitmap instead.
  2. If it for something like a textfield, create a "background sibling" for the textfield - eg, a Sprite with a drawRect added to the display list immediately below the textfield, and add the filter to this object. Since this object does not change even if the text changes, you should only need to do the filter once.
codeservice commented 6 years ago

I can only confirm it’s ARMV7. Flag set and all libs v7. Didn’t try arm64 yet with this filter. Workaround is not a solution for me, because I am providing access to NME for other developers as part of API. If something doesn’t work well simply not included.

thomasuster commented 6 years ago

@codeservice @hughsando Again, my diff above 100% fixes it (it is no a divide by zero bug). I spent more time trying different & less ugly implementations but was unsuccessful. If either of you have other ideas please schedule a conversation with me via twitter so we can debug together, otherwise I'm blocked.

hughsando commented 6 years ago

Yeah, that is really strange. I had 1 more idea - maybe the divisor is a power of 2, and your switch allows it to get transformed into a bit-shift instead of a divide call, or "/1" can be ignored (is it 1?). Not sure how to test this, apart from removing the non power-of-2 values from the switch, can falling back to "/" for these. If the pow2 values (1, 2, 4, 8, ...) are fast, then I think I can have 1 code path for pow2 filter size, and use fixed-point integer maths on the other path. Actually, maybe I should do this anyhow.

hughsando commented 6 years ago

Ok, I have made a change based on the idea that integer division is expensive on some android devices. I have not checked the performance on android. I may also need to think about rounding, but I'm not sure.

thomasuster commented 6 years ago

Good job @hughsando , works like a charm. Is the shift left 20 just so it's high enough to handle large inFilterSize? How did you decide the shift amount?

hughsando commented 6 years ago

I could have used 23 I think - it should be as big as possible without overflowing the sum. Basically, the maximum sum is going to be less than 256 filterElements scale. By setting scale to be 2^20 / filterElements, I know the sum will be less than 2^28 - plenty of room in 32 bit ints. But there still might be a bug in the rounding. If the filterElements=3, then (2^20/filterElements)*filterElements will be less than 2^20 so the result will round down when perhaps it should not have. Usually I make alpha of 254 the same as 255, so this should be ok. However if the filter is applied multiple times, it is possible this round down will accumulate. I should perhaps add scale/2 before shifting, but would need some kind of performance/quality information before doing this.

thomasuster commented 6 years ago

That makes sense, thanks for the information.