Open p0358 opened 1 year ago
This is a case where it is 100% on Bun to address -- not the fault of JavaScriptCore or the library.
Fair, I respect this approach. Just wanted to point out that it also happens in full WebKit-based browser too, in case it matters (I feel like some changes could possibly need to be made in either JSC or the library, I'm somewhat curious what could cause it to slow down this much)
That being said...looks like the Regex implementation in JSC is the cause.
Hi! I don't suppose it's possible to use https://github.com/google/re2 as a replacement to JSC's regex engine until the performance problems can be addressed?
I'm not sure if it is the same underlying issue, but I've ran into similar problems when combining capturing groups with quantifiers in RegExp
s. See following small bench script:
import { run, bench } from 'mitata';
const TEST_STRING =
'Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';
const REGEXES = [
/A{1,3}B{1,3}/g,
/A{1,3}(B){1,3}/g,
/(A)(B){1,3}/g,
/(A){1,3}(B){1,3}/g
];
for (const regex of REGEXES) {
bench(String(regex), () => {
TEST_STRING.match(regex);
});
}
await run();
The last regexp is ~23x slower on bun compared to node.js:
cpu: AMD Ryzen 5 2600X Six-Core Processor
runtime: node v20.6.1 (x64-linux)
benchmark time (avg) (min … max) p75 p99 p995
----------------------------------------------------------- -----------------------------
/A{1,3}B{1,3}/g 184.57 ns/iter (158.55 ns … 262.88 ns) 193.61 ns 257.91 ns 258.08 ns
/A{1,3}(B){1,3}/g 309.29 ns/iter (244.18 ns … 463.61 ns) 349.64 ns 459.74 ns 463.61 ns
/(A)(B){1,3}/g 273.13 ns/iter (244.92 ns … 465.79 ns) 279.6 ns 409.03 ns 465.79 ns
/(A){1,3}(B){1,3}/g 276.53 ns/iter (254.7 ns … 488.82 ns) 270.53 ns 487.69 ns 488.82 ns
runtime: bun 1.0.7 (x64-linux)
benchmark time (avg) (min … max) p75 p99 p995
----------------------------------------------------------- -----------------------------
/A{1,3}B{1,3}/g 147.16 ns/iter (130.07 ns … 269.99 ns) 143.71 ns 266.04 ns 268.78 ns
/A{1,3}(B){1,3}/g 312.42 ns/iter (282.64 ns … 515.22 ns) 309.78 ns 493.53 ns 515.22 ns
/(A)(B){1,3}/g 201.18 ns/iter (177.85 ns … 343.12 ns) 199.57 ns 329.35 ns 332.58 ns
/(A){1,3}(B){1,3}/g 6.81 µs/iter (6.53 µs … 7.69 µs) 6.94 µs 7.69 µs 7.69 µs
Do you want me to create a new issue or is this the same problem?
@knowhatamine What are you talking about? This has nothing to do with this issue, where poor/unoptimized RegExp code path in JavaScriptCore (not only Bun, but other projects based on it too) causes bigger time, exact same problem is observed on native Linux and MacOS, because we tested. Nothing to do with WSL or file reading. Otherwise the bug report would be singled out about file reading, without including markdown parsing into it!
@knowhatamine What are you talking about? This has nothing to do with this issue, where poor/unoptimized RegExp code path in JavaScriptCore (not only Bun, but other projects based on it too) causes bigger time, exact same problem is observed on native Linux and MacOS, because we tested. Nothing to do with WSL or file reading. Otherwise the bug report would be singled out about file reading, without including markdown parsing into it!
youre right. wrong tab.
but interesting indeed, i just started using both bun and wsl2 and use a LOT of regex. will watch out for that. havent noticed anything, despite being extremely performance-obsessed.
There's a few conditions that cause RegExp to not be JIT'ed (apparently), as one of the WebKit devs explained here: https://bugs.webkit.org/show_bug.cgi?id=258706#c2 Sadly there was no activity there ever since September.
And yeah I agree WSL is junk on its own, that thing refuses to even start up for me 80% of the time (sometimes both WSL2 and WSL1, and that's while Hyper-V normal VMs keep working, wonders of Windows).
What version of Bun is running?
0.6.11
What platform is your computer?
WSL | Microsoft Windows NT 10.0.19045.0 x64 | Linux 5.15.90.1-microsoft-standard-WSL2 x86_64 unknown
What steps can reproduce the bug?
I posted steps to reproduce the issue over there (includes screenshot from WebKit's profiler too): https://github.com/markedjs/marked/issues/2863
But TL;DR is that with this code:
and this markdown file: test.md
it takes 1.5s to run with Bun, and 0.07s to run with Node.
Worth to mention that it might be JavaScriptCore vs V8 issue, I was also able to reproduce it with Ultralight (embedded WebKit for programs), and there the issue was even more prevalent.
What is the expected behavior?
The code should have relatively similar performance in JSC vs V8, like other Markdown parsers
What do you see instead?
Very bad unexpected performance, 20x slower, taking seconds instead of miliseconds
Additional information
I know this is generelly an issue with Marked and JavaScriptCore rather than Bun itself. But I noticed that Bun tracks in the issue section a lot of issues with popular external libraries and their breakage in Bun vs Node, the goal being the same code as Node runs, also running on Bun. So I thought that the team here perhaps, given their experience with JSC, may want to take a peek at profiling this to find out where's the issue...