Closed codeservice closed 6 years ago
Strange just one filter, one one platform. It might be some bad bounds calculation and is doing too much work.
On Sat, Jun 25, 2016 at 10:52 AM, codeservice notifications@github.com wrote:
I checked Filters sample on android. For some reason GlowFilter extremely slowing down system. DropShadowFilter works ok. On Win, Mac, iOS I dont see this problem.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haxenme/nme/issues/331, or mute the thread https://github.com/notifications/unsubscribe/ABlp1pD8fuCFSEfU3WL-vVvoPkhETnyqks5qPJfwgaJpZM4I-P8I .
I notice the size of the filter is depends on the absolute mouse position. For a very HDPI display, this might add up pretty quickly.
It works well on retina old ios tablet (3-rd gen) and slows immediately on Nexus 7 as soon as I touch screen. Something different on android.
I wonder if it is using software floating point fpu/armv5. I've got some time off, and will have to look at it later.
Thanks!. My concern if this issue specific for this filter only or used in common public method for other stuff.
On Tue, Jun 28, 2016 at 11:45 AM, Hugh Sanderson notifications@github.com wrote:
I wonder if it is using software floating point fpu/armv5. I've got some time off, and will have to look at it later.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/haxenme/nme/issues/331#issuecomment-229091010, or mute the thread https://github.com/notifications/unsubscribe/ABsIG8YGWPf2Ao7Vj_12rmoJEcEag2H4ks5qQUGKgaJpZM4I-P8I .
Recompiled sample project with <arch value="armv7" if="android"/> instead of default armv5. Result the same. Doesn't look like problem with FPU.
I can see correlation between circle radius and rendering speed. What strange if radius bigger 30 ..32 pixels, speed slows down dramatically. maybe some kind of power of 2 related issue. Out of ideas now ...
@codeservice is this still an issue?
Yes, filter very slow only on Android, I dont know why. Maybe its hardware specific issue. If someone can try Filters sample on different device and confirm this issue.
Reproduced on my Nexus 5, does seem quite a bit slower. Also did an addChild(new FPS()) to see an fps counter. I'll try to dig in more.
On nexus 7 when this filter works, CPU goes up to 100% and app become unresponsive.
I think it's a problem in nme/project/src/common/Display.cpp
or lower. Still narrowing down...
Like if you replace const FilterList &filters = obj->getFilters();
with an empty FilterList there doesn't seem to be problems.
I am not sure how code can process filters if this line return empty list.
Maybe this empty list force Power of 2 operation and it works well in this case: if (inState.mRoundSizeToPOW2 && filters.size()==0) { w = UpToPower2(w); h = UpToPower2(h); }
@codeservice Oh no, to clarify I just meant there's some extra heap allocations that are made in haxe side cloning the filters passed in instead of just using them directly. Also instead of updating the filters used when moving the mouse brand new filters are set every single frame. I thought perhaps problems were there, but it does seem to be in processing the filters themselves, not in those places I hoped it would be. Next step would be to see which part of processing the filters is especially slow.
I saw many “for” inside “for” cycles processing this filter. This logic slow by defenition, as bigger object size as longer cycle time. Question only, why this different on android. Maybe it’s result of power of 2 which makes object size even bigger only on this platform and not on others. You can try make circles radius twice smaller and it will be reasonably fast on Android too. Wandering how this works on OpenFL, is it using same logic as NME and has the same issue?
Hey @codeservice ! I made an exciting discovery, would you join me in gitter? https://gitter.im/haxenme/nme
I am in glitter
This fixes the issue... https://github.com/thomasuster/NME/commit/b934cc4b1a8df32758626bd970d5c0d2e931fed3
Although that should not be the fix. I will need C++ expertise help on how we can get compiler to do that for us. Or some other optimization.
Maybe it should be:
If(inFilterSize<=255) *dest = sa/inFilterSize;
Your switch/case code simply limit to 255 filter size.
On Oct 4, 2018, at 23:46, Thomas Uster notifications@github.com wrote:
This fixes the issue... thomasuster@b934cc4
Although that should not be the fix. I will need C++ expertise help on how we can get compiler to do that for us. Or some other optimization.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
if(inFilterSize < 1 || inFilterSize > 255)
inFilterSize = 1;
*dest = sa / inFilterSize;
``
So the reason why I limit to 255 in case statement because if you look at the params passed in it's either blurX or blurY, which is bounded.
mBlurX = std::max(0, std::min(256, inBlurX-1) );
mBlurY = std::max(0, std::min(256, inBlurY-1) );
@hughsando So it cannot be divide by zero problem. Could it be some compiler optimization flag we're missing from the android toolchain? @madrazo any ideas?
So maybe there are a few things going on here.
Generally, it is be to avoid filters for objects that change every frame. There are a few techniques to help here.
I can only confirm it’s ARMV7. Flag set and all libs v7. Didn’t try arm64 yet with this filter. Workaround is not a solution for me, because I am providing access to NME for other developers as part of API. If something doesn’t work well simply not included.
@codeservice @hughsando Again, my diff above 100% fixes it (it is no a divide by zero bug). I spent more time trying different & less ugly implementations but was unsuccessful. If either of you have other ideas please schedule a conversation with me via twitter so we can debug together, otherwise I'm blocked.
Yeah, that is really strange. I had 1 more idea - maybe the divisor is a power of 2, and your switch allows it to get transformed into a bit-shift instead of a divide call, or "/1" can be ignored (is it 1?). Not sure how to test this, apart from removing the non power-of-2 values from the switch, can falling back to "/" for these. If the pow2 values (1, 2, 4, 8, ...) are fast, then I think I can have 1 code path for pow2 filter size, and use fixed-point integer maths on the other path. Actually, maybe I should do this anyhow.
Ok, I have made a change based on the idea that integer division is expensive on some android devices. I have not checked the performance on android. I may also need to think about rounding, but I'm not sure.
Good job @hughsando , works like a charm. Is the shift left 20 just so it's high enough to handle large inFilterSize? How did you decide the shift amount?
I could have used 23 I think - it should be as big as possible without overflowing the sum. Basically, the maximum sum is going to be less than 256 filterElements scale. By setting scale to be 2^20 / filterElements, I know the sum will be less than 2^28 - plenty of room in 32 bit ints. But there still might be a bug in the rounding. If the filterElements=3, then (2^20/filterElements)*filterElements will be less than 2^20 so the result will round down when perhaps it should not have. Usually I make alpha of 254 the same as 255, so this should be ok. However if the filter is applied multiple times, it is possible this round down will accumulate. I should perhaps add scale/2 before shifting, but would need some kind of performance/quality information before doing this.
That makes sense, thanks for the information.
I checked Filters sample on android. For some reason GlowFilter extremely slowing down system. DropShadowFilter works ok. On Win, Mac, iOS I dont see this problem.