oven-sh / bun

Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
https://bun.sh
Other
73.6k stars 2.72k forks source link

setImmediate / setTimeout with delay 0 seems blocking the main thread #3333

Closed andrasat closed 11 months ago

andrasat commented 1 year ago

What version of Bun is running?

0.6.9

What platform is your computer?

Linux 6.1.31-2-MANJARO x86_64 unknown

What steps can reproduce the bug?

So ive been playing with setImmediate and setTimeout in this repo, and try make a simple server with endpoints:

/blocking and /not-blocking has the same cpu intensive task, with difference that /blocking is sync, and /not-blocking is async.

Steps:

bun start:node // node npm run start:node // node

- hit `/ping` every second with this command
```bash
while true; do date; curl 127.0.0.1:3030/ping; sleep 1; echo; done;

In node, they are working as expected, when /not-blocking is still running, i can still receive response from /ping endpoint. But in bun, /not-blocking is blocking and /ping response is returned after the task is finished.

What is the expected behavior?

What do you see instead?

Additional information

No response

simylein commented 1 year ago

For me this actually segfaults at the moment with

curl: (52) Empty reply from server

and

BUN:GET:/ping:2023-07-15T18:21:44.126Z
BUN:GET:/not-blocking:2023-07-15T18:21:44.298Z
Segmentation fault: 11

I am running bun 0.6.14 on Darwin 22.5.0 arm64 arm

andrasat commented 1 year ago

UPDATED: @simylein I got segfaulted to in my m1 pro lol, Darwin 22.5.0 arm64 arm, and bun version 0.6.14, even in /blocking endpoint, it seems there is some problem with crypto module?

-- Screenshot --

Screenshot 2023-07-16 at 18 17 05

Hmm that's interesting @simylein , I haven't had any issues with segfaults tho, with 0.6.14 and running bun-debug in 0.6.15-debug (in my WSL 2 desktop)

Anyway I tried again today with the latest commit in main (0.6.15-debug), it might be related to issue #3226, but when I tried using other than 0 delay time in setTimeout, /not-blocking actually not blocking other endpoints, although it finished slower, i adjusted the iteration to be lower so it can finished quickly.

In bun-debug, there are some differences in logs when running setTimeout 1 and setTImeout 0 or setImmediate

-- Screenshot -- setTimeout0

-- Screenshot -- setTimeout1

fubhy commented 1 year ago

effect-ts is also affected by this (https://github.com/Effect-TS/io/pull/662)

ThePrimeagen commented 1 year ago

Here is a simple code example to see it happen


function test() {
    console.log('11111');
    setTimeout(test, 0);
}

function test2() {
    console.log('22222');
    setTimeout(test2, 1);
}

test();
test2();

run bun test.js or node test.js to see the difference

orangeswim commented 1 year ago

Troubleshooting the issue.

I modified the test code to

function repeat(log, delay, count) {
  if (count <= 0) return;
  console.log(log);
  setTimeout(() => repeat(log, delay, count - 1), delay);
}

const timesToRun = 5;
repeat("1", 0, timesToRun);
repeat("2", 1, timesToRun);

I added debug statements in fifo.zig and the events are getting queued in an unexpected order, instead of the expected alternating messages. There seems to be some race condition since adding the extra commands above is changing the order of the queue. In the original test code by @ThePrimeagen , one function always beat the other.

Output

/home/orange/bun/bun/packages/debug-bun-linux-x64/bun-debug test.js
[SYS] read(3, 4096) = 4096 (0.052ms)
[SYS] close(3)
[fs] openat(10, /home/orange/bun/bun/package.json) = 11
[SYS] close(11)
[fs] openat(10, /home/orange/bun/bun/tsconfig.json) = 11
[SYS] close(11)
[fs] openat(0, /home/orange/bun/bun/tsconfig.base.json) = 11
[SYS] close(11)
[SYS] close(6)
[SYS] close(7)
[SYS] close(8)
[SYS] close(9)
[SYS] close(10)
[fs] openat(0, /home/orange/bun/bun/test.js) = 12
[SYS] close(12)
11111
setTimeout
set
writeItem
  Count: 0
22222
setTimeout
set
[Loop] ref
11111
setTimeout
set
writeItem
  Count: 0
11111
setTimeout
set
writeItem
  Count: 0
11111
setTimeout
set
writeItem
  Count: 0
11111
setTimeout
set
writeItem
  Count: 0
[Loop] unref
[uws] Timer.deinit()
writeItem
  Count: 0
22222
setTimeout
set
[Loop] ref
[Loop] unref
[uws] Timer.deinit()
writeItem
  Count: 0
22222
setTimeout
set
[Loop] ref
[Loop] unref
[uws] Timer.deinit()
writeItem
  Count: 0
22222
setTimeout
set
[Loop] ref
[Loop] unref
[uws] Timer.deinit()
writeItem
  Count: 0
22222
setTimeout
set
[Loop] ref
[Loop] unref
[uws] Timer.deinit()
writeItem
  Count: 0

In bun.zig Timer.set there is a condition specifically for 0 interval setTimeout. Removing this logic doesn't fix things as the extern call to us_timer_set in Timer.set doesn't work with 0 ms. The timer never calls the callback.

My understanding of the events: setTimeout( x, 0) places the cb task on the FIFO queue setTImeout(x,y) goes on a uws.Timer. The uws.Timer callback places the cb task on the FIFO queue. The uws.Timer happens outside of the event_loop.zig tick function.

orangeswim commented 1 year ago

Ok after sleeping on it. I have the correct order of events.

So this introduces an infinite loop when a setTimeout with 0 interval adds another setTimeout with 0 interval. I haven't dug into too much of the nodejs code, but it appears they have a global queue, or at least an intermediate queue for everything that should happen on the nextTick.

Any one have any thoughts on the appropriate flow for setTimeout 0 tasks? I'm thinking it should add the CallbackJob / task on one of the other queues so it doesn't get stuck in that loop.

Updated test code

function repeat(log, delay, count) {
  if (count <= 0) return;

  var i = 0;
  while (i < 10000) {
    i++;
  }
  console.log(log, i);
  setTimeout(() => repeat(log, delay, count - 1), delay);
}

const timesToRun = 5;
repeat("11111", 0, timesToRun);
repeat("22222", 1, timesToRun);

Bun Output

```bash /home/orange/bun/bun/packages/debug-bun-linux-x64/bun-debug test.js [SYS] read(3, 4096) = 4096 (0.085ms) [SYS] close(3) [fs] openat(10, /home/orange/bun/bun/package.json) = 11 [SYS] close(11) [fs] openat(10, /home/orange/bun/bun/tsconfig.json) = 11 [SYS] close(11) [fs] openat(0, /home/orange/bun/bun/tsconfig.base.json) = 11 [SYS] close(11) [SYS] close(6) [SYS] close(7) [SYS] close(8) [SYS] close(9) [SYS] close(10) EventLoop.tick tickWithCount [fs] openat(0, /home/orange/bun/bun/test.js) = 12 [SYS] close(12) 11111 10000 setTimeout set 0 id:1 22222 10000 setTimeout set 1 id:2 [Loop] ref tickWithCount tickWithCount.while tasks.readItem 11111 10000 setTimeout set 0 id:3 tickWithCount.while tasks.readItem 11111 10000 setTimeout set 0 id:4 tickWithCount.while tasks.readItem 11111 10000 setTimeout set 0 id:5 tickWithCount.while tasks.readItem 11111 10000 setTimeout set 0 id:6 tickWithCount.while tasks.readItem tickWithCount tickWithCount Loop.tick [Loop] unref [uws] Timer.deinit() EventLoop.tick tickWithCount tickWithCount.while tasks.readItem 22222 10000 setTimeout set 1 id:7 [Loop] ref tickWithCount tickWithCount EventLoop.tick tickWithCount tickWithCount Loop.tick [Loop] unref [uws] Timer.deinit() EventLoop.tick tickWithCount tickWithCount.while tasks.readItem 22222 10000 setTimeout set 1 id:8 [Loop] ref tickWithCount tickWithCount Loop.tick [Loop] unref [uws] Timer.deinit() EventLoop.tick tickWithCount tickWithCount.while tasks.readItem 22222 10000 setTimeout set 1 id:9 [Loop] ref tickWithCount tickWithCount Loop.tick [Loop] unref [uws] Timer.deinit() EventLoop.tick tickWithCount tickWithCount.while tasks.readItem 22222 10000 setTimeout set 1 id:10 [Loop] ref tickWithCount tickWithCount Loop.tick [Loop] unref [uws] Timer.deinit() EventLoop.tick tickWithCount tickWithCount.while tasks.readItem tickWithCount tickWithCount ```

Node Output (exepected)

11111 10000
22222 10000
11111 10000
22222 10000
11111 10000
22222 10000
11111 10000
22222 10000
11111 10000
22222 10000
ericklt commented 1 year ago

Expanding on @orangeswim 's comment.

The core of the event_loop's tick() function is:

while (this.tickWithCount() > 0) : (this.global.handleRejectedPromises()) {
    this.tickConcurrent();
}

where there's a loop that execute the tasks:

// event_loop.zig > EventLoop > tickWithCount (L:581)
while (this.tasks.readItem()) |task| {...}

Due to setTimeout(_, 0) being a direct call to the Loop function enqueTask:

// bun.zig > Timer > set (L:3644)
vm.enqueueTask(JSC.Task.init(&job.task));

// event_loop.zig > EventLoop > enqueTask (L:902)
pub fn enqueueTask(this: *EventLoop, task: Task) void {
    this.tasks.writeItem(task) catch unreachable;
}

when a js functon is executed and calls setTimeout(_, 0), it adds to the this.tasks queue before the tickWithCount loop resolves. If the function is recursive or calls multiple setTimeout(_, 0), it keeps the bun event_loop from ticking, possibly indefinetly.

This is the first problem, and can be solved by using an aux. queue as @orangeswim said, or some variation of this, for example:

var executable_tasks: Queue = EventLoop.Queue.init(this.virtual_machine.allocator);
while (this.tasks.readItem()) |task| {
    executable_tasks.writeItem(task) catch unreachable;
}
while (executable_tasks.readItem()) |task| {...}

But the problem goes deeper, doing this unlocks the ticking of the event loop, but the loop still doesn't listen for uws events triggered by setTimeouts with delay greater than 0, since it gets stuck in this loop:

// event_loop.zig > EventLoop > tick (L:841)
while (this.tickWithCount() > 0) : (this.global.handleRejectedPromises()) {
    this.tickConcurrent();
}

By doing something like this:

while (this.tickWithCount() > 0) : (this.global.handleRejectedPromises()) {
    this.tickConcurrent();
    this.autoTick(); // <-- added
}

the loop is able to insert tasks inbetween ticks, but this is probably not the right solution since in some tests that I did, it made the setTimeout(_, 0) calls seem slow (apparently taking 1s between each execution, which is odd).

The test I'm using:

const timesToRun = 10;

var runCount = 0
const obj = {}

function repeat(log, delay) {
  if (runCount >= timesToRun) {
    console.log(`${log}: ${obj[log]}`)
    return
  }

  runCount += 1
  obj[log] = (obj[log] || 0) + 1
  // console.log(`${log}: ${obj[log]}`)

  setTimeout(() => repeat(log, delay), delay);
}

repeat("1", 0)
repeat("2", 1)
ericklt commented 1 year ago

One interesting thing to keep in mind, running

repeat("1", 0)
repeat("2", 1)

in nodejs gives the exact same output as

repeat("1", 1)
repeat("2", 1)

regardless of the amount of repetitions, so it apparently treats setTimeout(_, 0) the same as setTimeout(_, 1).

When running

repeat("1", 1)
repeat("2", 2)

it gives the expected 2/1 ratio.

Jarred-Sumner commented 1 year ago

I love the investigative work here

I think making setTimeout have a minimum of 1 is a good option and leaving setImmediate to be the one with special behavior that appends to the task queue. Maybe in the future we will implement requestIdleCallback.

ericklt commented 1 year ago

That's probably a good temporary solution.

Something that I just tried, was running setTimeout(_, 0) as any other setTimeout by removing the condition that puts it directly on the loop, and adding this to the _epollkqueue.c us_timer_set function:

if (ms == 0) {
    timer_spec.it_value.tv_sec = 0;
    timer_spec.it_value.tv_nsec = 1L;
}

this'll only work on linux since it uses the epoll_kqueue event loop, but it seemed to work flawlessly


test 1 (100_000 runs):

repeat("1", 0) // (key, timeout[ms])
repeat("2", 1)

result:

1: 98554
2: 1446

test 2 (100_000 runs):

repeat("1", 0)
repeat("2", 2)

result:

1: 99245
2: 755

I don't know if it can be implemented like this for other OSs, nor the exact impact on compared performance when using timeout 0, but it seems like a interesting temporary solution as well.

ewrogers commented 1 year ago

On macOS I believe you could use Grand Central Dispatch (GCD) via dispatch_after:

dispatch_time_t delay = dispatch_time(DISPATCH_TIME_NOW, (int64_t)(yourDelayInSeconds * NSEC_PER_SEC));

dispatch_after(delay, dispatch_get_main_queue(), ^{
    // Your code to be executed after the specified delay
});

As for Windows you can use CreateTimerQueue, CreateTimerQueueTimer, and DeleteTimerQueue APIs:

void CALLBACK TimerCallback(PVOID lpParameter, BOOLEAN TimerOrWaitFired) {
    // Your code to be executed after the specified delay
    printf("Timeout complete!\n");
}

void setTimeout(void (*callback)(), DWORD delayMs) {
    HANDLE hTimerQueue = CreateTimerQueue();
    if (hTimerQueue == NULL) {
        // Handle error
        return;
    }

    if (!CreateTimerQueueTimer(&hTimer, hTimerQueue, TimerCallback, NULL, delayMs, 0, 0)) {
        // Handle error
    }

    // Clean up the timer queue when done (optional)
    // DeleteTimerQueue(hTimerQueue);
}

I suppose more platform-specific native code could be added to packages/bun-usockets/src/eventing/ for this.

Jarred-Sumner commented 11 months ago

This will be fixed in the next version of Bun (once #6674 is merged)

The script from @ThePrimeagen below:

function test() {
    console.log('11111');
    setTimeout(test, 0);
}

function test2() {
    console.log('22222');
    setTimeout(test2, 1);
}

test();
test2();

Now logs the following infinitely repeating in bun and node:

11111
22222
11111
22222
11111
22222
11111
22222
11111
22222