Closed andrasat closed 11 months ago
For me this actually segfaults at the moment with
curl: (52) Empty reply from server
and
BUN:GET:/ping:2023-07-15T18:21:44.126Z
BUN:GET:/not-blocking:2023-07-15T18:21:44.298Z
Segmentation fault: 11
I am running bun 0.6.14
on Darwin 22.5.0 arm64 arm
UPDATED:
@simylein I got segfaulted to in my m1 pro lol, Darwin 22.5.0 arm64 arm
, and bun version 0.6.14
, even in /blocking
endpoint, it seems there is some problem with crypto module?
-- Screenshot --
Hmm that's interesting @simylein , I haven't had any issues with segfaults tho, with 0.6.14
and running bun-debug in 0.6.15-debug
(in my WSL 2 desktop)
Anyway I tried again today with the latest commit in main (0.6.15-debug
), it might be related to issue #3226, but when I tried using other than 0 delay time in setTimeout, /not-blocking
actually not blocking other endpoints, although it finished slower, i adjusted the iteration to be lower so it can finished quickly.
In bun-debug, there are some differences in logs when running setTimeout 1
and setTImeout 0 or setImmediate
setTimeout 0
or setImmediate
BUN:GET:/not-blocking:2023-07-16T09:47:00.906Z
[SYS] write(1, 47) = 47
[FileSink] Wrote 47 bytes (fd: 1, head: 0, 0/913)
[RequestContext] onResolve
[RequestContext] render
[RequestContext] render
[RequestContext] finalize (src.bun.js.api.server.NewRequestContext(false,true,src.bun.js.api.server.NewServer(false,true))@58b50550000)
[RequestContext] finalizeWithoutDeinit (src.bun.js.api.server.NewRequestContext(false,true,src.bun.js.api.server.NewServer(false,true))@58b50550000)
[RequestContext] finalizeWithoutDeinit: response_jsvalue != .zero
[RequestContext] finalizeWithoutDeinit: request_body != null
[RequestContext] finalizeWithoutDeinit: this.pathname.len 0 null
[Server] deinitIfWeCan
[RequestContext] deinit (src.bun.js.api.server.NewRequestContext(false,true,src.bun.js.api.server.NewServer(false,true))@58b50550000)
[RequestContext] create (src.bun.js.api.server.NewRequestContext(false,true,src.bun.js.api.server.NewServer(false,true))@58b50550000)
[bun] poll(1) writable: true (4)
-- Screenshot --
BUN:GET:/not-blocking:2023-07-16T09:51:03.473Z
[SYS] write(1, 47) = 47
[FileSink] Wrote 47 bytes (fd: 1, head: 0, 0/78)
[Loop] ref
[Loop] unref
[uws] Timer.deinit()
[Loop] ref
[Loop] unref
[uws] Timer.deinit()
.....
.....
.....
[Loop] ref
[Loop] unref
[uws] Timer.deinit()
[Loop] ref
[Loop] unref
[uws] Timer.deinit()
[RequestContext] onResolve
[RequestContext] render
[RequestContext] render
[RequestContext] finalize (src.bun.js.api.server.NewRequestContext(false,true,src.bun.js.api.server.NewServer(false,true))@45d2e550000)
[RequestContext] finalizeWithoutDeinit (src.bun.js.api.server.NewRequestContext(false,true,src.bun.js.api.server.NewServer(false,true))@45d2e550000)
[RequestContext] finalizeWithoutDeinit: response_jsvalue != .zero
[RequestContext] finalizeWithoutDeinit: request_body != null
[RequestContext] finalizeWithoutDeinit: this.pathname.len 0 null
[Server] deinitIfWeCan
[RequestContext] deinit (src.bun.js.api.server.NewRequestContext(false,true,src.bun.js.api.server.NewServer(false,true))@45d2e550000)
-- Screenshot --
effect-ts
is also affected by this (https://github.com/Effect-TS/io/pull/662)
Here is a simple code example to see it happen
function test() {
console.log('11111');
setTimeout(test, 0);
}
function test2() {
console.log('22222');
setTimeout(test2, 1);
}
test();
test2();
run bun test.js
or node test.js
to see the difference
Troubleshooting the issue.
I modified the test code to
function repeat(log, delay, count) {
if (count <= 0) return;
console.log(log);
setTimeout(() => repeat(log, delay, count - 1), delay);
}
const timesToRun = 5;
repeat("1", 0, timesToRun);
repeat("2", 1, timesToRun);
I added debug statements in fifo.zig
and the events are getting queued in an unexpected order, instead of the expected alternating messages. There seems to be some race condition since adding the extra commands above is changing the order of the queue. In the original test code by @ThePrimeagen , one function always beat the other.
Output
/home/orange/bun/bun/packages/debug-bun-linux-x64/bun-debug test.js
[SYS] read(3, 4096) = 4096 (0.052ms)
[SYS] close(3)
[fs] openat(10, /home/orange/bun/bun/package.json) = 11
[SYS] close(11)
[fs] openat(10, /home/orange/bun/bun/tsconfig.json) = 11
[SYS] close(11)
[fs] openat(0, /home/orange/bun/bun/tsconfig.base.json) = 11
[SYS] close(11)
[SYS] close(6)
[SYS] close(7)
[SYS] close(8)
[SYS] close(9)
[SYS] close(10)
[fs] openat(0, /home/orange/bun/bun/test.js) = 12
[SYS] close(12)
11111
setTimeout
set
writeItem
Count: 0
22222
setTimeout
set
[Loop] ref
11111
setTimeout
set
writeItem
Count: 0
11111
setTimeout
set
writeItem
Count: 0
11111
setTimeout
set
writeItem
Count: 0
11111
setTimeout
set
writeItem
Count: 0
[Loop] unref
[uws] Timer.deinit()
writeItem
Count: 0
22222
setTimeout
set
[Loop] ref
[Loop] unref
[uws] Timer.deinit()
writeItem
Count: 0
22222
setTimeout
set
[Loop] ref
[Loop] unref
[uws] Timer.deinit()
writeItem
Count: 0
22222
setTimeout
set
[Loop] ref
[Loop] unref
[uws] Timer.deinit()
writeItem
Count: 0
22222
setTimeout
set
[Loop] ref
[Loop] unref
[uws] Timer.deinit()
writeItem
Count: 0
In bun.zig
Timer.set there is a condition specifically for 0 interval setTimeout. Removing this logic doesn't fix things as the extern call to us_timer_set
in Timer.set
doesn't work with 0 ms. The timer never calls the callback.
My understanding of the events: setTimeout( x, 0) places the cb task on the FIFO queue setTImeout(x,y) goes on a uws.Timer. The uws.Timer callback places the cb task on the FIFO queue. The uws.Timer happens outside of the event_loop.zig tick function.
Ok after sleeping on it. I have the correct order of events.
event_loop.zig
EventLoop.tick()
EventLoop.tickWithCount()
will run an loop to run all tasks in this.tasks.So this introduces an infinite loop when a setTimeout with 0 interval adds another setTimeout with 0 interval. I haven't dug into too much of the nodejs code, but it appears they have a global queue, or at least an intermediate queue for everything that should happen on the nextTick.
Any one have any thoughts on the appropriate flow for setTimeout 0 tasks? I'm thinking it should add the CallbackJob / task on one of the other queues so it doesn't get stuck in that loop.
Updated test code
function repeat(log, delay, count) {
if (count <= 0) return;
var i = 0;
while (i < 10000) {
i++;
}
console.log(log, i);
setTimeout(() => repeat(log, delay, count - 1), delay);
}
const timesToRun = 5;
repeat("11111", 0, timesToRun);
repeat("22222", 1, timesToRun);
Bun Output
Node Output (exepected)
11111 10000
22222 10000
11111 10000
22222 10000
11111 10000
22222 10000
11111 10000
22222 10000
11111 10000
22222 10000
Expanding on @orangeswim 's comment.
The core of the event_loop's tick()
function is:
while (this.tickWithCount() > 0) : (this.global.handleRejectedPromises()) {
this.tickConcurrent();
}
where there's a loop that execute the tasks:
// event_loop.zig > EventLoop > tickWithCount (L:581)
while (this.tasks.readItem()) |task| {...}
Due to setTimeout(_, 0)
being a direct call to the Loop function enqueTask
:
// bun.zig > Timer > set (L:3644)
vm.enqueueTask(JSC.Task.init(&job.task));
// event_loop.zig > EventLoop > enqueTask (L:902)
pub fn enqueueTask(this: *EventLoop, task: Task) void {
this.tasks.writeItem(task) catch unreachable;
}
when a js functon is executed and calls setTimeout(_, 0)
, it adds to the this.tasks
queue before the tickWithCount
loop resolves. If the function is recursive or calls multiple setTimeout(_, 0)
, it keeps the bun event_loop from ticking, possibly indefinetly.
This is the first problem, and can be solved by using an aux. queue as @orangeswim said, or some variation of this, for example:
var executable_tasks: Queue = EventLoop.Queue.init(this.virtual_machine.allocator);
while (this.tasks.readItem()) |task| {
executable_tasks.writeItem(task) catch unreachable;
}
while (executable_tasks.readItem()) |task| {...}
But the problem goes deeper, doing this unlocks the ticking of the event loop, but the loop still doesn't listen for uws events triggered by setTimeouts with delay greater than 0, since it gets stuck in this loop:
// event_loop.zig > EventLoop > tick (L:841)
while (this.tickWithCount() > 0) : (this.global.handleRejectedPromises()) {
this.tickConcurrent();
}
By doing something like this:
while (this.tickWithCount() > 0) : (this.global.handleRejectedPromises()) {
this.tickConcurrent();
this.autoTick(); // <-- added
}
the loop is able to insert tasks inbetween ticks, but this is probably not the right solution since in some tests that I did, it made the setTimeout(_, 0)
calls seem slow (apparently taking 1s between each execution, which is odd).
The test I'm using:
const timesToRun = 10;
var runCount = 0
const obj = {}
function repeat(log, delay) {
if (runCount >= timesToRun) {
console.log(`${log}: ${obj[log]}`)
return
}
runCount += 1
obj[log] = (obj[log] || 0) + 1
// console.log(`${log}: ${obj[log]}`)
setTimeout(() => repeat(log, delay), delay);
}
repeat("1", 0)
repeat("2", 1)
One interesting thing to keep in mind, running
repeat("1", 0)
repeat("2", 1)
in nodejs gives the exact same output as
repeat("1", 1)
repeat("2", 1)
regardless of the amount of repetitions, so it apparently treats setTimeout(_, 0)
the same as setTimeout(_, 1)
.
When running
repeat("1", 1)
repeat("2", 2)
it gives the expected 2/1 ratio.
I love the investigative work here
I think making setTimeout have a minimum of 1 is a good option and leaving setImmediate to be the one with special behavior that appends to the task queue. Maybe in the future we will implement requestIdleCallback.
That's probably a good temporary solution.
Something that I just tried, was running setTimeout(_, 0)
as any other setTimeout
by removing the condition that puts it directly on the loop, and adding this to the _epollkqueue.c us_timer_set
function:
if (ms == 0) {
timer_spec.it_value.tv_sec = 0;
timer_spec.it_value.tv_nsec = 1L;
}
this'll only work on linux since it uses the epoll_kqueue event loop, but it seemed to work flawlessly
test 1 (100_000 runs):
repeat("1", 0) // (key, timeout[ms])
repeat("2", 1)
result:
1: 98554
2: 1446
test 2 (100_000 runs):
repeat("1", 0)
repeat("2", 2)
result:
1: 99245
2: 755
I don't know if it can be implemented like this for other OSs, nor the exact impact on compared performance when using timeout 0, but it seems like a interesting temporary solution as well.
On macOS I believe you could use Grand Central Dispatch (GCD) via dispatch_after:
dispatch_time_t delay = dispatch_time(DISPATCH_TIME_NOW, (int64_t)(yourDelayInSeconds * NSEC_PER_SEC));
dispatch_after(delay, dispatch_get_main_queue(), ^{
// Your code to be executed after the specified delay
});
As for Windows you can use CreateTimerQueue, CreateTimerQueueTimer, and DeleteTimerQueue APIs:
void CALLBACK TimerCallback(PVOID lpParameter, BOOLEAN TimerOrWaitFired) {
// Your code to be executed after the specified delay
printf("Timeout complete!\n");
}
void setTimeout(void (*callback)(), DWORD delayMs) {
HANDLE hTimerQueue = CreateTimerQueue();
if (hTimerQueue == NULL) {
// Handle error
return;
}
if (!CreateTimerQueueTimer(&hTimer, hTimerQueue, TimerCallback, NULL, delayMs, 0, 0)) {
// Handle error
}
// Clean up the timer queue when done (optional)
// DeleteTimerQueue(hTimerQueue);
}
I suppose more platform-specific native code could be added to packages/bun-usockets/src/eventing/
for this.
This will be fixed in the next version of Bun (once #6674 is merged)
The script from @ThePrimeagen below:
function test() {
console.log('11111');
setTimeout(test, 0);
}
function test2() {
console.log('22222');
setTimeout(test2, 1);
}
test();
test2();
Now logs the following infinitely repeating in bun and node:
11111
22222
11111
22222
11111
22222
11111
22222
11111
22222
What version of Bun is running?
0.6.9
What platform is your computer?
Linux 6.1.31-2-MANJARO x86_64 unknown
What steps can reproduce the bug?
So ive been playing with setImmediate and setTimeout in this repo, and try make a simple server with endpoints:
/ping
/blocking
/not-blocking
/blocking
and/not-blocking
has the same cpu intensive task, with difference that/blocking
is sync, and/not-blocking
is async.Steps:
bun start:node // node npm run start:node // node
/not-blocking
In node, they are working as expected, when
/not-blocking
is still running, i can still receive response from/ping
endpoint. But in bun,/not-blocking
is blocking and/ping
response is returned after the task is finished.What is the expected behavior?
/not-blocking
first, then/ping
/ping
should return{ "data": "PONG" }
immediately without waiting for/not-blocking
to finishWhat do you see instead?
/not-blocking
first, then/ping
/ping
is waiting for/not-blocking
finish before returning{ "data": "PONG" }
Additional information
No response