Open bmteller opened 2 weeks ago
@bmteller, the latest version of Bun is v1.1.25, but this crash was reported on Bun v1.1.24.
Are you able to reproduce this crash on the latest version of Bun?
bun upgrade
Pretty sure this crash was fixed in Bun v1.1.25
Please leave a comment if you still run into it
had the same issue with 1.1.26:
Bun v1.1.26 (0a37423
) on linux aarch64 [RunCommand]
Segmentation fault at address 0x00000A72
uws.zig:117
: TCPSocketClass__finalize
uws.zig:117
: TCPSocketClass__finalize
ZigGeneratedClasses.cpp:40060
: WebCore::JSTCPSocket::destroy
ld-temp.o:0
: JSC::PreciseAllocation::sweep
ld-temp.o:0
: JSC::Heap::handleNeedFinalize
ld-temp.o:0
: JSC::Heap::finishChangingPhase
ld-temp.o:0
: JSC::Heap::runEndPhase
ld-temp.o:0
: JSC::Heap::runCurrentPhase
Features: jsc, Bun.stdin, fetch, http_server, spawn, tsconfig_paths, tsconfig
The finalise code in src/bun.js/api/bun/socket.zig
looks a bit suspicious.
If you trace the code in the case where finalise
is called and the socket is open it looks like there is a read-after-free error. but maybe I'm looking at the code incorrectly because I assume that would mean any kind of 'socket' leak where the GC ends up cleaning the socket instead of it being explicitly closed would cause a crash and this crash doesn't look that common.
finalize:
this.flags.finalizing = true;
if (!this.socket.isClosed()) {
this.closeAndDetach(.failure);
closeAndDetach:
const socket = this.socket;
this.socket.detach();
socket.close(code);
onClose:
this.socket.detach();
defer this.deref();
defer this.markInactive();
if (this.flags.finalizing) {
return;
}
markInactive():
ignored in this analysis
deref:
deinit: (if refcount is zero?)
this.destroy()
(i assume memory should not be accessed after this point)
back to finalize():
this.deref();
(accesses memory that is undefined)
In the title I said it might be related to Bun.serve but it could also be related to Bun.fetch. I think its hitting the happy eyeball code path because its trying to fetch()
a localhost url and in the container localhost
resolves to ::1
and 127.0.0.1
.
It has to be related TCP sockets in JS, which usually means a database connection of some kind
and yeah, happy eyeballs is plausible - that it’s another edge case involving connection errors with TCP
So this was originally happening in production but it was not easily able to reproduce it locally. But I have now reproduced it locally but I'm not sure if I'll be able to create a minimal reproduction.
So the Bun app receives 'real' http requests which involve it making fetch
requests to another app and also interaction with a complicated node library which uses a pipe to communicate with another process. I guess similar to nodejs I think the pipe is not really a pipe but a unix domain socket. The 'bun' app also receives 'health' check http requests from a load balancer. Locally, I was able to reproduced it by continually making the load balancer health requests every 5 seconds in the background and also making a number of 'real' requests that were spaced out. I think I made 4 requests in total the last request which causes the segmentation fault was about 70 minutes after the previous request.
Running locally the hosts are ipv4 only so there is no happy eyeballs. It also excludes some weirdness because in production there is no internet access so if something was crashing because it could not connect() then that would not happen locally.
We are also using brotliDecompress
so I guess potentially that could be an issue as well if it's bugged because it could be trashing the heap. But I guess a lot of the edge cases in brotli decompression implementation would show up if you are streaming the decompression and we just decompress a single Buffer
.
We are not able to share the docker container that reproduces this because it contains a lot of proprietary code. But if you have any tips on creating a debug build of bun in order to trace the issue I can try and trace the issue.
Can you share what libraries/packages are you using?
So this was originally happening in production but it was not easily able to reproduce it locally. But I have now reproduced it locally but I'm not sure if I'll be able to create a minimal reproduction.
So the Bun app receives 'real' http requests which involve it making
fetch
requests to another app and also interaction with a complicated node library which uses a pipe to communicate with another process. I guess similar to nodejs I think the pipe is not really a pipe but a unix domain socket. The 'bun' app also receives 'health' check http requests from a load balancer. Locally, I was able to reproduced it by continually making the load balancer health requests every 5 seconds in the background and also making a number of 'real' requests that were spaced out. I think I made 4 requests in total the last request which causes the segmentation fault was about 70 minutes after the previous request.Running locally the hosts are ipv4 only so there is no happy eyeballs. It also excludes some weirdness because in production there is no internet access so if something was crashing because it could not connect() then that would not happen locally.
We are also using
brotliDecompress
so I guess potentially that could be an issue as well if it's bugged because it could be trashing the heap. But I guess a lot of the edge cases in brotli decompression implementation would show up if you are streaming the decompression and we just decompress a singleBuffer
.We are not able to share the docker container that reproduces this because it contains a lot of proprietary code. But if you have any tips on creating a debug build of bun in order to trace the issue I can try and trace the issue.
How can we reproduce the crash?
I'm going to try and generate a minimal reproduction but don't have it at the moment.
Relevant log output
Stack Trace (bun.report)
Bun v1.1.24 (
85a3299
) on linux aarch64 [RunCommand]Segmentation fault at address 0x00005345
uws.zig:117
:TCPSocketClass__finalize
uws.zig:117
:TCPSocketClass__finalize
ZigGeneratedClasses.cpp:38416
:WebCore::JSTCPSocket::destroy
ld-temp.o:0
:JSC::PreciseAllocation::sweep
ld-temp.o:0
:JSC::Heap::handleNeedFinalize
ld-temp.o:0
:JSC::Heap::finishChangingPhase
ld-temp.o:0
:JSC::Heap::runEndPhase
ld-temp.o:0
:JSC::Heap::runCurrentPhase