Open ghost opened 9 years ago
Hi,
I'm not sure offhand why you'd experience such a performance regression. I haven't seen one before and there aren't really that many significant changes between those versions.
Zik Saleeba zik@zikzak.net
On Mon, Jun 22, 2015, at 10:15 PM, mtrif wrote:
I've made a superficial timing testing of the captioned versions using an ad-hoc long empty for loop.
It resulted that ver. 2.2 is approximately 3 times slower than ver. 2.1 !!!
Could you tell me what am I doing wrong ?
Step-by-step:
I've got ver. 2.1 from Google Code and ver. 2.2 form GitHub.
I made the necessary cosmetic changes to unistd in order to get compiled by MinGW32 toolkit.
Got both complied (no optimisations) with TDM-GCC 4.8.1 32bit release on Win 7.
Run a very simple script containing an empty for and looped 3 million times.
Thank you in advance.
— Reply to this email directly or view it on GitHub[1].
Links:
Hi Zik, I've done something more, this time the test script was an empty while loop, as follows:
int Count; int CountMax1 = 3000; int CountMax2 = 3000000; clock_t begin, end; double seconds1, seconds2; printf("\nwhile loop with print\n"); begin = clock(); Count = 1; while (Count <= CountMax1) { printf("%d\n", Count); Count++; } end = clock(); seconds1 = (double)(end - begin) / CLOCKS_PER_SEC; printf("\nwhile loop without print\n"); begin = clock(); Count = 1; while (Count <= CountMax2) { Count++; } end = clock(); seconds2 = (double)(end - begin) / CLOCKS_PER_SEC;
printf("\nwhile loop with printf done %d times (took %f secs)\n", CountMax1, seconds1); printf("\nwhile loop without printf done %d times (took %f secs)\n", CountMax2, seconds2);
void main() {}
The result was marginally better, but version 2.2 is still much slower.
I think I found something, with the help of gprof: in version 2.2 the script spent a lot of time inside VariableScopeBegin and VariableScopeEnd (approx. 50% the whole time). I couldn't attach the output from gprof, hence I had to paste an excerpt (I kept only down to 1%).
I'm sure you will find the proper medicine soon. For the time being I'll keep on playing with version 2.1.
VERSION 2.2. Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
25.67 2.02 2.02 3003003 0.00 0.00 VariableScopeBegin
24.90 3.98 1.96 _mcount_private
23.13 5.80 1.82 3003003 0.00 0.00 VariableScopeEnd
3.81 6.10 0.30 45086681 0.00 0.00 LexGetRawToken
2.60 6.30 0.20 6015030 0.00 0.00 ExpressionParse
1.78 6.45 0.14 45086681 0.00 0.00 LexGetToken
1.78 6.59 0.14 12021050 0.00 0.00 ExpressionStackCollapse
1.59 6.71 0.13 36060297 0.00 0.00 HeapAllocStack
1.33 6.82 0.11 36057734 0.00 0.00 VariableAlloc
1.21 6.91 0.10 9012034 0.00 0.00 ParseStatement
1.14 7.00 0.09 3003016 0.00 0.00 ExpressionInfixOperator
1.14 7.09 0.09 TableSearch
0.89 7.16 0.07 36067717 0.00 0.00 ParserCopy
VERSION 2.1 Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
54.66 1.79 1.79 _mcount_private
6.41 2.00 0.21 48090152 0.00 0.00 LexGetRawToken
6.11 2.20 0.20 6015033 0.00 0.00 ExpressionParse
4.73 2.36 0.16 TableSearch
3.21 2.46 0.11 48090152 0.00 0.00 LexGetToken
3.05 2.56 0.10 9012048 0.00 0.00 ParseStatement
2.44 2.64 0.08 36057773 0.00 0.00 VariableAlloc
2.14 2.71 0.07 12021053 0.00 0.00 ExpressionStackCollapse
1.83 2.77 0.06 3003016 0.00 0.00 ExpressionInfixOperator
1.68 2.83 0.06 36068029 0.00 0.00 ParserCopy
1.37 2.87 0.04 _fentry__
1.22 2.91 0.04 36060327 0.00 0.00 HeapAllocStack
1.22 2.95 0.04 15024063 0.00 0.00 ExpressionStackPushValueNode
1.22 2.99 0.04 9012022 0.00 0.00 ExpressionStackPushLValue
1.22 3.03 0.04 9012022 0.00 0.00 VariableAllocValueFromExistingData
1.22 3.07 0.04 3003004 0.00 0.00 ParserCopyPos
0.61 3.09 0.02 27042272 0.00 0.00 HeapPopStack
I've made a superficial timing testing of the captioned versions using an ad-hoc long empty for loop. It resulted that ver. 2.2 is approximately 3 times slower than ver. 2.1 !!! Could you tell me what am I doing wrong ? Step-by-step: I've got ver. 2.1 from Google Code and ver. 2.2 form GitHub. I made the necessary cosmetic changes to unistd in order to get compiled by MinGW32 toolkit. Got both complied (no optimisations) with TDM-GCC 4.8.1 32bit release on Win 7. Run a very simple script containing an empty for and looped 3 million times.
Thank you in advance.