Note: As of 8/24 there are no immediate plans to drop the non-SIMD version, this issue is for planning purposes.
The Tesseract.js-core package currently includes 4 different versions of the Tesseract.js WebAssembly build: Legacy+LSTM vs. LSTM-only and SIMD-support vs. non-SIMD support. This causes build times to be long, and has bloated the Tesseract.js-core npm package to ~31MB. While the separate Legacy+LSTM and LSTM-only builds will always need to exist, we will eventually be able to drop support for the non-SIMD version, which will reduce the total number of builds from 4 to 2.
The latest versions of every major browser on every major platform now supports WebAssembly SIMD. Therefore, this is simply a question of waiting for user adoption of the latest browsers/devices to become sufficiently high. There are 2 sources of data that can inform this decision: (1) general stats on browser adoption, and (2) our own stats on Tesseract.js usage from the JSDelivr CDN.
Regarding the former, according to caniuse.com, as of 8/24, 92.23% of users have browsers that support WebAssembly SIMD. However, this is misleadingly low (as it relates to WebAssembly SIMD specifically), as it includes ancient browsers such as Internet Explorer that would not be supported by any version of Tesseract.js. When we use the 96.78% of browsers that support WebAssembly as the denominator, the percentage that support SIMD is 95.3%. The largest group of browsers that supports WebAssembly but not WebAssembly SIMD is Safari for iOS, which accounts for ~2% of total users.
Regarding the second data source, according to JSDelivr, the CDN used by default, in Q2 2024 the most commonly used 2 versions of Tesseract.js-core were 5.0.0 and 4.0.4. These two versions had a combined 11,256,421 hits for the SIMD-supported version, and 86,202 hits for non-SIMD versions, which means the SIMD versions were 99.2% of the total.
I have no plans to drop the non-SIMD versions for now, however we can revisit these stats down the line, and hopefully the number of devices that support WebAssembly but not WebAssembly SIMD drops close to 0%.
Note: As of 8/24 there are no immediate plans to drop the non-SIMD version, this issue is for planning purposes.
The Tesseract.js-core package currently includes 4 different versions of the Tesseract.js WebAssembly build: Legacy+LSTM vs. LSTM-only and SIMD-support vs. non-SIMD support. This causes build times to be long, and has bloated the Tesseract.js-core npm package to ~31MB. While the separate Legacy+LSTM and LSTM-only builds will always need to exist, we will eventually be able to drop support for the non-SIMD version, which will reduce the total number of builds from 4 to 2.
The latest versions of every major browser on every major platform now supports WebAssembly SIMD. Therefore, this is simply a question of waiting for user adoption of the latest browsers/devices to become sufficiently high. There are 2 sources of data that can inform this decision: (1) general stats on browser adoption, and (2) our own stats on Tesseract.js usage from the JSDelivr CDN.
Regarding the former, according to caniuse.com, as of 8/24, 92.23% of users have browsers that support WebAssembly SIMD. However, this is misleadingly low (as it relates to WebAssembly SIMD specifically), as it includes ancient browsers such as Internet Explorer that would not be supported by any version of Tesseract.js. When we use the 96.78% of browsers that support WebAssembly as the denominator, the percentage that support SIMD is 95.3%. The largest group of browsers that supports WebAssembly but not WebAssembly SIMD is Safari for iOS, which accounts for ~2% of total users.
Regarding the second data source, according to JSDelivr, the CDN used by default, in Q2 2024 the most commonly used 2 versions of Tesseract.js-core were
5.0.0
and4.0.4
. These two versions had a combined 11,256,421 hits for the SIMD-supported version, and 86,202 hits for non-SIMD versions, which means the SIMD versions were 99.2% of the total.I have no plans to drop the non-SIMD versions for now, however we can revisit these stats down the line, and hopefully the number of devices that support WebAssembly but not WebAssembly SIMD drops close to 0%.