robertknight / tesseract-wasm

JS/WebAssembly build of the Tesseract OCR engine for use in browsers and Node
https://robertknight.github.io/tesseract-wasm/
BSD 2-Clause "Simplified" License
245 stars 26 forks source link

Find binary size optimizations (that don't hurt performance) #7

Open robertknight opened 2 years ago

robertknight commented 2 years ago

The WASM binary is currently ~1.6MB in size, ~550KB after Brotli compression. It would be nice to find ways to reduce this, as long as they don't significantly hurt performance.

I tried compiling Leptonica and Tesseract with -DCMAKE_BUILD_TYPE=MinSizeRel. This significantly reduced binary size (1.6MB => 1.0MB) but also hurt performance as well. OCR-ing the test/test-page.jpg image went from ~3.1s to 5.0s. There might be other compile flag combinations that could reduce binary size without hurting performance as much.

robertknight commented 2 years ago

Another option that occurs to me is to use -DCMAKE_BUILD_TYPE=MinSizeRel but use different flags for small number of source files that are performance critical.

wydengyre commented 2 years ago

How does performance with the current EMCC -Os flag compare to -O2? Emscripten docs indicate there can be a real size/performance tradeoff between those two.

robertknight commented 2 years ago

The -Os flag is used when compiling the wrapper code and linking the binaries, but Tesseract and Leptonica, which do the heavy lifting, are compiled with -O2 (this is not specified explicitly in the Makefile, but cmake creates a Release build by default, which uses -O2). From what I recall, altering the optimization level in EMCC_FLAGS didn't make that much of a difference, since it only applies to a small proportion of the code, but I don't have actual numbers to hand.