Closed GoogleCodeExporter closed 8 years ago
Hello Cesar,
It would be nice to get some more comments about the suggested patch. As I can
see the only code change is this one:
- for ( int i = 0; i < inputsCount; i++ )
+ for ( int i = 0; i < input.Length; i++ )
By making small test and inspecting its IL code using ILDASM, we can see that
the second approach just adds two extra commands (see attachment), which don't
look making code faster. Could you please comment your suggestion?
>> I would also suggest exposing a Layer's Neurons property as
>> public get / protected set,
This seems to be doable.
Original comment by andrew.k...@gmail.com
on 3 Jan 2012 at 10:49
Attachments:
Hi Andrew,
Almost no optimizations are done in IL level. The real optimizations are almost
always done in the JIT level. Consider, for example, the following snippets of
code (please ignore the Debugger.Break instruction, it is there to enable
debugging of the optimized native x86 code):
static void loop1(int[] array)
{
System.Diagnostics.Debugger.Break();
for (int i = 0; i < array.Length; i++)
array[i] += i;
}
Now consider the following code:
static void loop2(int[] array)
{
System.Diagnostics.Debugger.Break();
int n = array.Length;
for (int i = 0; i < n; i++)
array[i] += i;
}
Intuition would say the second snippet is fast, as we are caching the size
variable. However, an inspection of the x86 code generated reveals why this
wouldn't be the case:
Here is the code generated for the cached size version:
00000000 push ebp
00000001 mov ebp,esp
00000003 push edi
00000004 push esi
00000005 mov esi,ecx
00000007 call 5FB2D5A4
0000000c mov edi,dword ptr [esi+4]
0000000f xor edx,edx
00000011 test edi,edi
00000013 jle 00000027
00000015 mov ecx,dword ptr [esi+4]
00000018 cmp edx,ecx
0000001a jae 0000002B
0000001c lea eax,[esi+edx*4+8]
00000020 add dword ptr [eax],edx
00000022 inc edx
00000023 cmp edx,edi
00000025 jl 00000018
00000027 pop esi
00000028 pop edi
00000029 pop ebp
0000002a ret
0000002b call 60182EF4
00000030 int 3
Note the block starting at 0000002b. It is solely devoted to throwing the
possible out-of-range exception in case the jae instruction at 0000001a detects
the index is out of the allowed range.
Now, consider the code generated for the direct version, with no intermediate
caching:
00000000 push ebp
00000001 mov ebp,esp
00000003 push esi
00000004 mov esi,ecx
00000006 call 5FB2D5DC
0000000b xor edx,edx
0000000d mov ecx,dword ptr [esi+4]
00000010 test ecx,ecx
00000012 jle 0000001F
00000014 lea eax,[esi+edx*4+8]
00000018 add dword ptr [eax],edx
0000001a inc edx
0000001b cmp ecx,edx
0000001d jg 00000014
0000001f pop esi
00000020 pop ebp
00000021 ret
As you can see, the native code does not even include an exception throwing
block and thus neither a jae instruction. It does not needs to do any array
bounds checking because the compiler can trust the array.Length to have the
correct length for the array.
For more information we could take a look on the CLR code generation blog post
at
http://blogs.msdn.com/b/clrcodegeneration/archive/2009/08/13/array-bounds-check-
elimination-in-the-clr.aspx
Well, I hope the suggestion makes sense now!
Regards,
Cesar
Original comment by cesarso...@gmail.com
on 3 Jan 2012 at 3:19
Hi Andrew,
I actually took the time to update the entire Neuro project to use direct array
computations. I've loaded the two projects (the original/current and the
updated/optimized version) side-by-side and did some comparisons. Here are the
numbers for the particular test I've done:
Optimized: 00:00:01.5514587
Optimized: 00:00:01.1486776
Optimized: 00:00:01.4274900
Optimized: 00:00:01.1395174
Optimized: 00:00:01.4854211
Current : 00:00:02.6056515
Current : 00:00:02.1727088
Current : 00:00:02.5613240
Current : 00:00:01.9834494
Current : 00:00:01.9183204
Of course it doesn't differ by much, but we are talking about a very simple
test problem here. Considering very large problems it could make a noticeable
difference. And I suppose that optimizations which also improve for code
clarity and code readability wouldn't hurt after all! :)
I just didn't attach a patch right now because I am having problems after a
TortoiseSVN update. If you wish I can send the test project/compressed Neuro
folder by mail. I will also try to generate the patch on another computer and
update this issue later, if you could wait.
Regards,
Cesar
Original comment by cesarso...@gmail.com
on 3 Jan 2012 at 7:30
Here it is.
Original comment by cesarso...@gmail.com
on 3 Jan 2012 at 8:17
Attachments:
OK, accepted.
Original comment by andrew.k...@gmail.com
on 4 Jan 2012 at 9:51
1) Exposed Neuron.Weights, Layer.Neurons, Network.Layers properties instead of
using [] accessors.
2) Optimized access to arrays to eliminate bound-checking.
Committed in revision 1655. Will be released 2.2.4
P.S. The provided patch is sample of good and bad fix at the same time. Good
– an improvement was introduced. Bad – sample applications stopped building.
Original comment by andrew.k...@gmail.com
on 12 Jan 2012 at 10:35
Sorry about that, completely forgot about the sample applications :)
Original comment by cesarso...@gmail.com
on 12 Jan 2012 at 10:55
NP, fixed those anyway. Just a note for the future ;)
Original comment by andrew.k...@gmail.com
on 12 Jan 2012 at 11:00
Original comment by andrew.k...@gmail.com
on 23 Feb 2012 at 9:11
Original issue reported on code.google.com by
cesarso...@gmail.com
on 28 Dec 2011 at 5:39Attachments: