Closed SheetJSDev closed 6 months ago
Very interesting. Thank you for doing this benchmark
there is something strange happening
Thanks for digging into this! JSC must not be handling some string concatenation well, which gives a good starting point for further investigation.
How did you generate the profile? Is there a bun option like node --prof
/ --prof-process
?
How did you generate the profile? Is there a bun option like node --prof / --prof-process?
I wish, though there is some way to use JSC's sampling profiler but I haven't quite figured it out yet.
This is using a build of bun with codesigning the debugging entitlement
Then I ran it in instruments on macOS
On Linux, you could use sudo perf trace
but the debug symbols are stripped
In the days after the issue was raised, 0.1.3
and 0.1.4
cleaned up some of the Buffer and Uint8Array issues.
XLS read performance is still bottlenecked by #455 . The library falls back to a pure-JS encoder since Buffer#toString("utf16le")
is not correct, but it's definitely fixable.
In testing, we found that Uint8Array#slice
has copy-on-write semantics in V8 but not in JavaScriptCore. In context, when reading a file, the slices are read but never mutated, so switching to Uint8Array#subarray
is safe. subarray
was a marginal improvement in Node / Deno but was significant in Bun.
To best understand Bun's potential, we looked at the .numbers
file format first since it exclusively uses UTF8 encoding for strings. Current performance comparing bun to node/deno as well as a pure python script (cat-numbers
from python numbers-parser
) and a native C++ program (numbers2csv
from libetonyek
):
large_strings.numbers
reflects JSC vs V8 performance:test.numbers
reflects Bun's startup and transpilation performance:Deno's relatively poor performance stems from SWC:
Deno additionally parses all sources with swc even if they aren't typescript
Without any special optimizations (bun read.js
from above), Bun's performance is exciting.
bun readbun.js
uses bun bun
to bundle and env BUN_DISABLE_TRANSPILER=1
skips the transpilation step. Both are electrifying for our use case!
SheetJS is typically used in short-lived data munging tasks like Github's "Flat Data" project (https://githubnext.com/projects/flat-data/). Even though it's early days, we're really excited by the results!
I think this is fixed. ARM mac:
./doit.sh
v21.5.0
deno 1.41.1 (release, aarch64-apple-darwin)
v8 12.1.285.27
typescript 5.3.3
1.0.30
large_strings.numbers
Benchmark 1: bun read.js large_strings.numbers
Time (mean ± σ): 776.0 ms ± 19.5 ms [User: 799.0 ms, System: 127.0 ms]
Range (min … max): 761.4 ms … 829.7 ms 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: node read.mjs large_strings.numbers
Time (mean ± σ): 942.2 ms ± 8.7 ms [User: 994.0 ms, System: 156.2 ms]
Range (min … max): 933.4 ms … 964.5 ms 10 runs
Benchmark 3: deno run --allow-read read.ts large_strings.numbers
Time (mean ± σ): 978.7 ms ± 9.8 ms [User: 1231.9 ms, System: 284.5 ms]
Range (min … max): 967.5 ms … 992.4 ms 10 runs
Summary
bun read.js large_strings.numbers ran
1.21 ± 0.03 times faster than node read.mjs large_strings.numbers
1.26 ± 0.03 times faster than deno run --allow-read read.ts large_strings.numbers
Linux x64:
./doit.sh
v16.13.2
deno 1.41.1 (release, x86_64-unknown-linux-gnu)
v8 12.1.285.27
typescript 5.3.3
1.0.30
large_strings.numbers
Benchmark 1: bun read.js large_strings.numbers
Time (mean ± σ): 1.448 s ± 0.051 s [User: 1.319 s, System: 0.246 s]
Range (min … max): 1.400 s … 1.543 s 10 runs
Benchmark 2: node read.mjs large_strings.numbers
Time (mean ± σ): 1.761 s ± 0.062 s [User: 1.640 s, System: 0.220 s]
Range (min … max): 1.667 s … 1.838 s 10 runs
Benchmark 3: deno run --allow-read read.ts large_strings.numbers
Time (mean ± σ): 2.110 s ± 0.060 s [User: 2.219 s, System: 0.210 s]
Range (min … max): 2.034 s … 2.233 s 10 runs
Summary
'bun read.js large_strings.numbers' ran
1.22 ± 0.06 times faster than 'node read.mjs large_strings.numbers'
1.46 ± 0.07 times faster than 'deno run --allow-read read.ts large_strings.numbers
the benchmark shown today is different than the one initially. (numbers vs xls file)
still think it's resolved. here is my run of the modified script on .xls
nodeno(master) $ bash doit.sh
v21.5.0
deno 1.41.1 (release, aarch64-apple-darwin)
v8 12.1.285.27
typescript 5.3.3
1.0.30
large_strings.xls
Benchmark 1: bun read.js large_strings.xls
Time (mean ± σ): 769.4 ms ± 12.3 ms [User: 887.9 ms, System: 121.3 ms]
Range (min … max): 756.9 ms … 798.3 ms 10 runs
Benchmark 2: node read.mjs large_strings.xls
Time (mean ± σ): 1.249 s ± 0.009 s [User: 1.351 s, System: 0.102 s]
Range (min … max): 1.237 s … 1.265 s 10 runs
Benchmark 3: deno run --allow-read read.ts large_strings.xls
Time (mean ± σ): 2.354 s ± 0.020 s [User: 2.339 s, System: 0.376 s]
Range (min … max): 2.314 s … 2.389 s 10 runs
Summary
bun read.js large_strings.xls ran
1.62 ± 0.03 times faster than node read.mjs large_strings.xls
3.06 ± 0.06 times faster than deno run --allow-read read.ts large_strings.xls
Here are results for linux x64 which are similar
./doit.sh
v16.13.2
deno 1.41.1 (release, x86_64-unknown-linux-gnu)
v8 12.1.285.27
typescript 5.3.3
1.0.30
large_strings.xls
Benchmark 1: bun read.js large_strings.xls
Time (mean ± σ): 1.613 s ± 0.169 s [User: 1.502 s, System: 0.220 s]
Range (min … max): 1.429 s … 1.842 s 10 runs
Benchmark 2: node read.mjs large_strings.xls
Time (mean ± σ): 2.606 s ± 0.114 s [User: 2.414 s, System: 0.170 s]
Range (min … max): 2.481 s … 2.852 s 10 runs
Benchmark 3: deno run --allow-read read.ts large_strings.xls
Time (mean ± σ): 4.957 s ± 0.366 s [User: 4.291 s, System: 0.663 s]
Range (min … max): 4.680 s … 5.882 s 10 runs
Summary
'bun read.js large_strings.xls' ran
1.62 ± 0.18 times faster than 'node read.mjs large_strings.xls'
3.07 ± 0.39 times faster than 'deno run --allow-read read.ts large_strings.xls'
First off, congratulations on the public release!
Testing https://github.com/sheetjs/sheetjs against Bun reveals some strange performance issues compared against NodeJS and Deno (which can be partially explained by V8 vs JavaScriptCore) as well as Safari and Chrome.
https://github.com/sheetjs/nodeno repro:
Versions are printed at the beginning of the script
Testing on a 16" 2019 MBP, the first test (parsing a 56MB XLS file) shows a significant performance gap.
Browser performance can be tested by dragging and dropping a file into https://oss.sheetjs.com/. This is a slanted test because the browser actually does more work (FileReader callback -> ArrayBuffer -> workbook object -> CSV generation -> populate canvas). Even with the extra work, it is roughly 6 seconds in Chrome and roughly 9 seconds in Safari, both still noticeably faster than Bun.