sorlok / waitzar

Automatically exported from code.google.com/p/waitzar
Other
0 stars 0 forks source link

WaitZar loads slowly. #128

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
For some reason, WZ is loading slowly. Now, some slowdown is fine, but +3 
seconds from the previous release is a bit silly. We're at the point now where 
WZ is nearly release-ready, so performance tuning makes sense.

Some of the slowdown is definitely due to the config files. We may have to 
cache them like we do with keyboards. 

But I think some Windows API calls are just bogging the system down for no 
reason. When I added a "Settings" dialog, the load time increased by ~1sec, and 
this was without actually doing anything!

So, first I'm going to test all our custom menu code. My first goal is to see 
if the resource file is faster or slower than loading menus programmatically.

Original issue reported on code.google.com by seth.h...@gmail.com on 5 Jul 2010 at 4:16

GoogleCodeExporter commented 9 years ago
Attached: program which generates 1000*4000 menus programmatically, taking ~2.4 
seconds.

I'll modify this to generate resource.h and Menu.rc files automatically, so we 
can easily compile the alternate method and compare the two.

This file uses AppendMenu() & InsertMenuItem(); using InsertMenu caused no 
noticeable difference. 

Generating 4000*1000 menus or 2000*2000 menus were both around the same speed, 
~2.1 seconds. I used 1000*4000 because it's more representative to have a few 
main menus with a lot of submenu options.

Original comment by seth.h...@gmail.com on 5 Jul 2010 at 4:19

Attachments:

GoogleCodeExporter commented 9 years ago
Compiling 500 * 500 menus as a resource file takes forever, so that's our upper 
bound. The 100 * 100 test compiled in almost no time.

500 * 500:
  program:  225 ms
  resource:  30 ms

100 * 100:
  program:   30 ms
  resource:  21 ms

20 * 60:
  program:      20 ms
  resource:  18~26 ms

Attached are the files for the 20x60 build. The menu is loaded like so:
    HMENU defaultMenu = LoadMenu(hInstance, MAKEINTRESOURCE(MY_MENU));
    SetMenu(hWnd, defaultMenu);

Original comment by seth.h...@gmail.com on 6 Jul 2010 at 2:15

Attachments:

GoogleCodeExporter commented 9 years ago
One more:

250 * 250: /*Takes about 20s to build*/
  program:  ~100 ms
  resource:  ~23 ms

Both take up about the same amount of memory, maybe +20kb for the programmatic 
one. 

Note that the resource-created menus don't work properly after ~100 or so; I 
think the consistent ~25 ms is observed because they cut off after 100 menus.

Original comment by seth.h...@gmail.com on 6 Jul 2010 at 2:18

GoogleCodeExporter commented 9 years ago
Conclusion: Menus aren't causing any noticeable slowdown. 
Observation: Adding items programmatically (up to reasonable limits) has the 
same performance as resources, and almost always compiles faster.
Observation: There's some fluctuation; the first time the program is run it 
seems to take up to 10X longer.

Finally, the effect of resources on executable SIZE is enormous. For the 20 * 
60 resource bundle:
  Program EXE size:  17 KB  ( 9 KB after UPX)
  Resource EXE size: 54 MB  (12 KB after UPX)

So, it's 3x bigger (1.3 x for UPX). Note that MainFile.rc is roughly 59 KB of 
text. If you replace MENUITEM with M and remove redundant letters from the 
identifiers, you get 28 KB. 28+17 = 45 KB, so I'm not quite sure where VS is 
getting the extra ~10kb. My guess is that using Resource files to store Win32 
data structures is inherently inefficient.

Either way, this sub-issue can be closed. We'll at least know that inserting 
Menu Items programmatically is unlikely to be slow in and of itself.

Original comment by seth.h...@gmail.com on 6 Jul 2010 at 2:32

GoogleCodeExporter commented 9 years ago
I now want to investigate dialog creation. Let's create the Settings dialog 
(which has a lot of Combo boxes) both programmatically and using a combination 
RC file + WM_CREATE logic.

(Basically, I don't want to waste time optimizing if there's an "easy" 
solution, like a flaw in Win32 colors, or something. Also, we already learned 
something useful about RC files taking up lots of EXE space.)

Original comment by seth.h...@gmail.com on 6 Jul 2010 at 2:35

GoogleCodeExporter commented 9 years ago
This program takes 70~90ms to run. It produces 20 Label+DropDown pairs, each 
with a handful of items. It does so programmatically, although it _does_ pull a 
basic dialog box from a RC file.

Screenshot attached. The "resource" version of this will load/layout the labels 
using the resource file, and will only change the combo boxes when the dialog 
loads.

Original comment by seth.h...@gmail.com on 6 Jul 2010 at 10:22

Attachments:

GoogleCodeExporter commented 9 years ago
Added a RC version; this loads 80~100 ms, and sometimes slows down on startup.
Attached is code & screenshot; note that control STYLES are different when 
loading from resource bundles. Ugh... why...

In terms of EXE size:
   Program:  28 KB
   Resource: 30 KB
Be aware that the "Resource" MainFile contains the LTEXT strings, so they're 
actually both about 28 KB.

So "program" slightly outperforms "resource" in this case. And, "program" tends 
to vary less on startup, from what I can see. 

Since we have to play with styles anyway for background colors, etc., I'm 
inclined to load dialogs from code & leave only a "WZ_DLG" shell inside 
WaitZar.rc. The only problem is that bizarre styling differences bug....

To-do for this Issue:
   * Figure out how to have "nice" styles in program-generated items.
   * Clean up system-generated garbage from resource.h & WaitZarRes.rc, since we won't be using the VS editor anyway...
   * If this applies to menus, too, then replace WZ's menu resources with code-generated ones.
   * Replace WZ's dialogs with program-generated dialogs.
   * Finish display code (no saving/loading yet) for Settings dialog.
   * Profile startup process in WZ using inline code segments. Produce detailed log.
   * Resume optimization from here.

Original comment by seth.h...@gmail.com on 6 Jul 2010 at 11:06

Attachments:

GoogleCodeExporter commented 9 years ago
Fixed (1): It was actually just the font. DS_SETFONT apparently applies only 
within the context of the resource file itself, even though DIALOGEX saves the 
font resource (sigh). See attached.

Used:
  HFONT dlgFont = (HFONT)SendMessage(dlgHWND, WM_GETFONT, 0, 0);
...and then:
  SendMessage(lblHWND, WM_SETFONT, (WPARAM)dlgFont, MAKELPARAM(FALSE, 0));
  SendMessage(hwComboBoxHWND, WM_SETFONT, (WPARAM)dlgFont, MAKELPARAM(FALSE, 0));

Now, to try it on menus...

Original comment by seth.h...@gmail.com on 6 Jul 2010 at 11:20

Attachments:

GoogleCodeExporter commented 9 years ago
Finished 1, 2, 4. Looking into 3.

Original comment by seth.h...@gmail.com on 8 Jul 2010 at 4:09

GoogleCodeExporter commented 9 years ago
(3) is done. Remaining items (+1 I've added):

   5) Replace IDR_PG(DOWN|UP)_(COLOR|SEPIA) with a single item. Flip it and sepia-ize it after loading to dynamically create the other 3.
   6) Finish display code (no saving/loading yet) for Settings dialog.
   7) Profile startup process in WZ using inline code segments. Produce detailed log.
   8) Resume optimization from here.

Original comment by seth.h...@gmail.com on 8 Jul 2010 at 5:21

GoogleCodeExporter commented 9 years ago
Note to self:

Grayscale = (R+G+B)/3
... OR:
Grayscale = Luminance = (2126*R + 7152*G + 722*B)/10000

Sepia:
R = Grayscale * SepiaR/255
G = Grayscale * SepiaG/255
B = Grayscale * SepiaB/255 

Sepia  = [112, 66, 20]

Try with "luminance" first.

Original comment by seth.h...@gmail.com on 8 Jul 2010 at 5:33

GoogleCodeExporter commented 9 years ago
Java-ish version that seems to work well:

public Color filterRGB(int r, int g, int b){
    int depth = 30; //30 should be good, but 10~40 works too.

    int gray = (r + g + b) / 3;  //Can also try w/ luminance.
    r = g = b = gray;

    r = r + (depth * 2);
    g = g + depth;
    if (r > 255)
      r = 255;
    if (g > 255)
      g = 255;

    return new Color(r, g, b);
}

Original comment by seth.h...@gmail.com on 8 Jul 2010 at 5:41

GoogleCodeExporter commented 9 years ago
Used a combinatorial algorithm; added a sliding scale for black (to preserve 
lines).
 (5) is done. 

(6) is the last easy one, then we get to (7) and (8), were we actually identify 
problem areas.

Original comment by seth.h...@gmail.com on 8 Jul 2010 at 6:38

GoogleCodeExporter commented 9 years ago
Some hints on making a tab control:

//Make the top-level control
hwndTab = CreateWindow(WC_TABCONTROL,"",WS_CHILD|WS_VISIBLE,
          0,0,rect.right,rect.bottom,mathtablehwnd,NULL,
          (HINSTANCE)GetWindowLong(hwnd,GWL_HINSTANCE),NULL);

//Add a panel
TCITEM tcitem;
tcitem.pszText = "Tab 1"; //Set other members in TCITEM
int tabctrlPg = TabCtrl_InsertItem(hwndTab, 0, &tcitem);

//Add a static item TO that panel
hwSt= CreateWindow(WC_STATIC,"Static Control~",
      WS_CHILD|WS_VISIBLE,
      rect.left+3,rect.top+23,rect.right-7,rect.bottom-27,hwndTab,NULL,
      (HINSTANCE)GetWindowLong(hwndTab,GWL_HINSTANCE),NULL);

//Everything else should be easy. And if we save the value of "tabctrlPg" into 
//  a struct, we can save other things (like language ID) and get a FULL
//  language-aware tab dialog for WZ 1.8.

//I'm thinking of having "N/A" and "Auto" settings for the "defaultXXX" items.
//  N/A means "don't save this value --use the one in config.json.txt"
//  Auto means "restore the most-recently-used XXXX upon reloading WZ.
//For Auto, we'll have to constantly save the current InputMethod, etc. 
//  Perhaps a tiny file that only saves this information? We don't want
//  to re-save the local config file each time a language is changed. 
//  Something like this:
//    [version]
//    [state-string]
//  ...where "version" is an integer, like "1", in case we change the format,
//  and "state-string" is, e.g., "myanmar-waitzar-unicode-zawgyi", 
//  or "shan-test-unicode-unicode". Perhaps we can have multiple state 
//  strings --one per language-- and just load the relevant one. That makes 
sense...

Anyway, moving on...

Original comment by seth.h...@gmail.com on 9 Jul 2010 at 12:54

GoogleCodeExporter commented 9 years ago
Yay, tabbed panes (kind of) work~

Original comment by seth.h...@gmail.com on 9 Jul 2010 at 3:44

Attachments:

GoogleCodeExporter commented 9 years ago
Interesting article:
   http://arstechnica.com/civis/viewtopic.php?f=20&t=169886

One point in particular:
   Specifying a menu as "owner-drawn" disables themeing for the entire menu! 

So we can re-enable nice cursors for the "Typing" menu if we look into this.

Of course, we already have a nice cursor, so it's not really necessary at this 
point.

Original comment by seth.h...@gmail.com on 13 Jul 2010 at 1:47

GoogleCodeExporter commented 9 years ago
Opening and closing of the popup menu works, tracking of the "current" item 
we're providing help for also works.

Besides adding real help text (and re-enabling the combo boxes, which are 
mysteriously missing text and default entries) I'd like to add a "pointer" 
overlay that points to the current help icon, just so people don't get 
confused. Maybe a dark triangle? I want it to be seamless.

Original comment by seth.h...@gmail.com on 13 Jul 2010 at 3:28

Attachments:

GoogleCodeExporter commented 9 years ago
Here's the current version; all text has been entered, and all combo boxes load 
correctly.

There's no interaction, and the "arrow" idea is being troublesome (alpha 
blending). I think I'll just move on to performance profiling; we can tidy the 
graphics up in 1.9.

Original comment by seth.h...@gmail.com on 29 Jul 2010 at 9:28

Attachments:

GoogleCodeExporter commented 9 years ago
After a great deal of re-writing and beating around the bush, I've just logged 
a full trace of the program's startup:

Starting WaitZar {
  Trace
    1 ms - Command Line Parsed
    46 ms - Windows Initialized
    427 ms - Config files located
    Generating context menu
      0 ms - Myanmar font created
      1417 ms - Context menu created
    1419 ms - Context menu created
    0 ms - Icon resources loaded
    1 ms - Initial hotkey registered
    18 ms - Tray icon added
    0 ms - Keyboard input struct initialized (is this needed?)
    Starting default language
      438 ms - LangInOut is set
      0 ms - Cached entries saved
      0 ms - Help keyboard hack enabled.
      1 ms - Input reset
      1455 ms - Context menus rebuilt
      6 ms - Windows resized
      1 ms - Windows re-rasterized
      292 ms - Current LangInOut flashed to disk
    2193 ms - Default language set
    1 ms - Key highlight thread created
    1 ms - System tray icon set to "ready"
  4107 ms - WaitZar is now running
}

So, in analysis:
  * The ~500ms for ConfigLoad and LangInOutSet are expected, and can only be marginally improved. 
  * 1.5s for rebuilding the context menus is unexpected. I bet I could eliminate most of it if I can figure out what's causing the real slowdown.
  * If I can improve both 500ms's a little, and the 1.5s's a lot, then I can get startup time down to 1~1.5s

My new goals, in order:
  * See if we can simply remove the first call to build the context menu.
  * See how to reduce the 1.5s time for building the context menus.
  * Look into reducing the time to load config files, etc. Perhaps we can cache some config settings? 
  * Anything else needed to get startup time down to 1.0s

Yay, progress~

Original comment by seth.h...@gmail.com on 10 Aug 2010 at 4:15

GoogleCodeExporter commented 9 years ago
Goal 1 complete; we can remove the first call --however, there's still some 
"scaffolding" in place which can be cleaned up later.

Loading time down to 2969 ms, that's a savings of ~1.1s, which is roughly what 
we expected.

Next, on to item 2!

Original comment by seth.h...@gmail.com on 11 Aug 2010 at 5:10

GoogleCodeExporter commented 9 years ago
Removed scaffolding; the startup sequence is far less likely to contain 
crash-worthy bugs now.

The breakdown of menu creation is as follows:
      Creating the context menu
        0 ms - Created top-level menu
        0 ms - Appended top-level menu items
        1400 ms - Re-built each sub-menu
        0 ms - Added each cached sub-menu.
      1402 ms - Context menus rebuilt

This is fairly good news; it means that Win32 is unlikely to contain our 
slowdown.

Original comment by seth.h...@gmail.com on 17 Aug 2010 at 2:11

GoogleCodeExporter commented 9 years ago
Interesting side-note: locating and loading the various config files takes 
almost no time at all. Validating them takes up the majority of the time. 

So, caching config files into valid binaries should help.

...however, it seems that the "get all languages" code (and related for 
encoding, etc.) is far slower than it should be. Speeding that up might 
eliminate the slowdown in our validation code.

Original comment by seth.h...@gmail.com on 17 Aug 2010 at 2:23

GoogleCodeExporter commented 9 years ago
Added a fake iterative loop in the menu loading:
        310 ms - Re-built sub-menu: languages
        586 ms - Re-built sub-menu: input methods
        1398 ms - faux-iterated through all components: 
        514 ms - Re-built sub-menu: encodings

Important things to note: 
  1) Running "encodings" still takes .5s after the faux loop (so perhaps caching isn't working)
  2) The total time is DEFININTELY caused by the iterator.

Compared to a vector index lookup:
        313 ms - Re-built sub-menu: languages
        593 ms - Re-built sub-menu: input methods
        1171 ms - faux-iterated through all components: 
        570 ms - Re-built sub-menu: encodings

So, it's not iterators, but rather "getting the set" itself. This is further 
evinced by the fact that our vector index code doesn't use 
"getLangauges().begin()."

Final test: loop 5 times each, calling getLanguages(), etc:
        347 ms - Re-built sub-menu: languages
        607 ms - Re-built sub-menu: input methods
        1097 ms - faux-iterated through all components: A
        540 ms - Re-built sub-menu: encodings

Slightly faster, but there's definitely a problem with our functions. More 
research after jogging.

Original comment by seth.h...@gmail.com on 17 Aug 2010 at 2:32

GoogleCodeExporter commented 9 years ago
Found the bug: we'd forgotten to set "subFilesLoaded" (actually, we set 
"mainFilesLoaded" twice). 

NOTICEABLY FASTER:

Starting WaitZar {
  Trace
    0 ms - Command Line Parsed
    42 ms - Windows Initialized
    Detecting & loading config files
      2 ms - Config files loaded
      426 ms - Config files validated
    429 ms - Config files located, loaded, and validated.
    Generating context menu font
      0 ms - Myanmar font created
    0 ms - Context menu font created
    1 ms - Icon resources loaded
    0 ms - Initial hotkey registered
    11 ms - Tray icon added
    1 ms - Keyboard input struct initialized (is this needed?)
    Starting default language
      0 ms - LangInOut is set
      1 ms - Cached entries saved
      0 ms - Help keyboard hack enabled.
      0 ms - Input reset
      Creating the context menu
        53 ms - Created top-level menu
        0 ms - Appended top-level menu items
        0 ms - Re-built sub-menu: languages
        0 ms - Re-built sub-menu: input methods
        0 ms - Re-built sub-menu: encodings
        1 ms - Copied built sub-menus to pointer-controlled memory
        0 ms - Added each cached sub-menu.
      54 ms - Context menus rebuilt
      5 ms - Windows resized
      0 ms - Windows re-rasterized
      1 ms - Current LangInOut flashed to disk
    61 ms - Default language set
    0 ms - Key highlight thread created
    2 ms - System tray icon set to "ready"
  547 ms - WaitZar is now running
}

The majority of time is now taken up by validating the config files. A 
breakdown:
      Reading & validating config files
        228 ms - Read physical files, parsed JSON
        9 ms - Initialized static classes with relevant information.
        177 ms - Resolved partial settings.
        1 ms - Generated list of input/output/display/encodings.
        0 ms - Set "active" input/output/display/encodings.
      415 ms - Config files validated

So, most of the time is taken up in:
  1) Reading and parsing the JSON.
  2) Resolving "partial" settings (those unattached or out-of-order at parse time).

The first could be easily solved with a binary JSON form --perhaps one exists, 
or we could write our own. The second could be part of this too, but it would 
be painful each time we updated our format.

Regardless, load time is now back to the way it was in 1.7 --basically 
non-noticeable. So I'm closing this issue unless we get complaints from our 
beta testers.

Original comment by seth.h...@gmail.com on 17 Aug 2010 at 3:48