Closed xgillard closed 1 year ago
Hey @xgillard! Thanks for reporting this. We have had similar reports of memory issues recently, especially from mac users with actix-web projects. When we've had issues like this in the past, the culprit has been our FFI, but we have yet to figure out why it's just happening on mac OS in this case.
I was able to do a local run on a mac (x86_64) when using the pre-built binary from our latest release, building the project with rust 1.65 (cargo +1.65 build
or setting the default toolchain to 1.65) and then doing a cargo shuttle run
. I have not tried this on an arm64 mac, however.
I am taking a look at this bug, FYI. What I noticed is that installing cargo-shuttle through cargo install cargo-shuttle
installs the 0.9.0 version and trying to run locally a project with actix-web
(a simple one, right after running cargo shuttle init
) results in the bus error each time. However, if I compile locally the release version of 0.9.0 cargo-shuttle binary and run it against the very same project, the actix-web
server runs without the bus error.
@oddgrd , do you have any ideas why this might happen?
Thanks for looking into this! Hmm, did you compile locally from the 0.9 tag or the latest main? Or do you mean to say you tried the pre-built binary from the release assets? In any case, which version of rustc did you use?
Hey @oddgrd ! I compiled the source code from 0.9 tag. I used rustc 1.67.1 on a M1 macbook pro with MacOS Ventura 13.2. Also, I noticed that 0.10.0 was released and it fails the same.
The latest updates on this topic from my side are that:
1) I'm having trouble with debugging this without the debug info - cargo build
doesn't generate them or at least I haven't found them yet - for the generated cargo-shuttle
Mach-O binary file.
2) I find it a bit strange that the installed binary - the one obtained from crates.io
, which downloads the source code and compiles it locally - is different in size than the binary obtained through cargo build --release
on the 0.9 tag. I do not think it's a stripping issue because I've tried stripping down both binaries and they are almost halved after the operation, and the delta still remains.
3) I am not used to debugging on an arm64 darwin machine. After trying to reproduce the issue for few more times, I noticed that starting an actix-web HTTP server with the 0.9 version (installed from crates.io
) fails intermittently when running everything inside lldb
, which makes me think there might be a race condition. However, I would need to look at a backtrace to pin point the place where the bus error is triggered. LE: the bad access stacktrace looks like this, but is more or less obfuscated:
(lldb) bt
* thread #3, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x1031048d0)
* frame #0: 0x00000001031048d0
frame #1: 0x0000000100055bd0 cargo-shuttle`_$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h1936596fa66aec54 + 160
frame #2: 0x00000001000c6960 cargo-shuttle`_$LT$futures_util..future..future..catch_unwind..CatchUnwind$LT$Fut$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h5a1a5ed4ceee3073 + 36
frame #3: 0x000000010005ccd0 cargo-shuttle`cargo_shuttle::Shuttle::run::_$u7b$$u7b$closure$u7d$$u7d$::h3bbdeee18d6a05c4 + 4396
frame #4: 0x00000001000769e0 cargo-shuttle`tokio::runtime::park::CachedParkThread::block_on::h143bc585254942a5 + 512
frame #5: 0x00000001000c5d10 cargo-shuttle`tokio::runtime::scheduler::multi_thread::MultiThread::block_on::h2dde6dff0ecf86ac + 104
frame #6: 0x00000001000d2810 cargo-shuttle`cargo_shuttle::main::h95dae9533139c17d + 288
frame #7: 0x00000001000aa6dc cargo-shuttle`std::sys_common::backtrace::__rust_begin_short_backtrace::h5e548e71a67487db + 12
frame #8: 0x00000001000c9128 cargo-shuttle`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::hbb9535e5e1976eb9 + 24
frame #9: 0x0000000100ad8c30 cargo-shuttle`std::rt::lang_start_internal::h00a235e820a7f01c [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::ha1c2447b9b665e13 at function.rs:606:13 [opt]
Meanwhile, do you have any ideas for why the crates.io installed binary is different in size than my local release build? I am not sure what's the difference in running cargo install
from crates.io vs cargo build --release
on the same tag as the ones present on crates.io
.
I dug a bit more and managed to get some useful information. Below is a more meaningful backtrace. To help a bit, what I found useful can be found in frames 10-13.
(lldb) bt
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10700a2f0)
* frame #0: 0x000000010700a2f0
frame #1: 0x00000001000e0580 cargo-shuttle`shuttle_service::Bootstrapper::bootstrap::_$u7b$$u7b$closure$u7d$$u7d$::hb764c6a56431fb33((null)=ResumeTy @ 0x000000016fde2ef0) at lib.rs:455:27
frame #2: 0x0000000100008cdc cargo-shuttle`_$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..future..future..Future$GT$::poll::hbf0b89ecb50007ec(self=Pin<&mut core::panic::unwind_safe::AssertUnwindSafe<shuttle_service::{impl#0}::bootstrap::{async_fn_env#0}>> @ 0x000000016fde2f78, cx=0x000000016fdf40d0) at unwind_safe.rs:296:9
frame #3: 0x00000001000d8580 cargo-shuttle`_$LT$futures_util..future..future..catch_unwind..CatchUnwind$LT$Fut$GT$$u20$as$u20$core..future..future..Future$GT$::poll::_$u7b$$u7b$closure$u7d$$u7d$::h69ad8721f3c6f557 at catch_unwind.rs:36:42
frame #4: 0x0000000100008f6c cargo-shuttle`_$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h31f1ae34f14b68e8(self=AssertUnwindSafe<futures_util::future::future::catch_unwind::{impl#1}::poll::{closure_env#0}<core::panic::unwind_safe::AssertUnwindSafe<shuttle_service::{impl#0}::bootstrap::{async_fn_env#0}>>> @ 0x000000016fde2fd0, _args=<unavailable>) at unwind_safe.rs:271:9
frame #5: 0x00000001000bdd64 cargo-shuttle`std::panicking::try::do_call::h74901a545fee1ecf(data="hF\xdfo\U00000001") at panicking.rs:483:40
frame #6: 0x00000001000c14f8 cargo-shuttle`__rust_try + 32
frame #7: 0x00000001000ba564 cargo-shuttle`std::panicking::try::h59d01eac51d130dd(f=AssertUnwindSafe<futures_util::future::future::catch_unwind::{impl#1}::poll::{closure_env#0}<core::panic::unwind_safe::AssertUnwindSafe<shuttle_service::{impl#0}::bootstrap::{async_fn_env#0}>>> @ 0x000000016fde3180) at panicking.rs:447:19
frame #8: 0x0000000100055bb4 cargo-shuttle`std::panic::catch_unwind::ha960c18ababb2db6(f=AssertUnwindSafe<futures_util::future::future::catch_unwind::{impl#1}::poll::{closure_env#0}<core::panic::unwind_safe::AssertUnwindSafe<shuttle_service::{impl#0}::bootstrap::{async_fn_env#0}>>> @ 0x000000016fde31d0) at panic.rs:137:14
frame #9: 0x00000001000d84c4 cargo-shuttle`_$LT$futures_util..future..future..catch_unwind..CatchUnwind$LT$Fut$GT$$u20$as$u20$core..future..future..Future$GT$::poll::hc1c633e231e26562(self=Pin<&mut futures_util::future::future::catch_unwind::CatchUnwind<core::panic::unwind_safe::AssertUnwindSafe<shuttle_service::{impl#0}::bootstrap::{async_fn_env#0}>>> @ 0x000000016fde32c0, cx=0x000000016fdf40d0) at catch_unwind.rs:36:9
frame #10: 0x0000000100093eec cargo-shuttle`shuttle_service::loader::Loader::load::_$u7b$$u7b$closure$u7d$$u7d$::hdb8964ccc5d408bc((null)=ResumeTy @ 0x000000016fde3c10) at loader.rs:79:13
frame #11: 0x000000010007dce8 cargo-shuttle`cargo_shuttle::Shuttle::local_run::_$u7b$$u7b$closure$u7d$$u7d$::hf474781a53833706((null)=ResumeTy @ 0x000000016fde5d08) at lib.rs:442:67
frame #12: 0x00000001000758bc cargo-shuttle`cargo_shuttle::Shuttle::run::_$u7b$$u7b$closure$u7d$$u7d$::h83fc8cdfc70b325d((null)=ResumeTy @ 0x000000016fdf10a8) at lib.rs:77:63
frame #13: 0x0000000100083604 cargo-shuttle`cargo_shuttle::main::_$u7b$$u7b$closure$u7d$$u7d$::ha519b8e4a453dfce((null)=ResumeTy @ 0x000000016fdf3f60) at main.rs:9:52
frame #14: 0x0000000100044370 cargo-shuttle`tokio::runtime::park::CachedParkThread::block_on::_$u7b$$u7b$closure$u7d$$u7d$::hc0f9571483ced3f7 at park.rs:272:63
frame #15: 0x0000000100043e64 cargo-shuttle`tokio::runtime::park::CachedParkThread::block_on::h40c86347832f7b42 at coop.rs:102:5
frame #16: 0x0000000100043dfc cargo-shuttle`tokio::runtime::park::CachedParkThread::block_on::h40c86347832f7b42 [inlined] tokio::runtime::coop::budget::h99a2fce96aa0dfe8(f={closure_env#0}<cargo_shuttle::main::{async_block_env#0}> @ 0x000000016fdf5648) at coop.rs:68:5
frame #17: 0x0000000100043d80 cargo-shuttle`tokio::runtime::park::CachedParkThread::block_on::h40c86347832f7b42(self=0x000000016fdf56ef, f={async_block_env#0} @ 0x000000016fdf56f0) at park.rs:272:31
frame #18: 0x00000001000c9b80 cargo-shuttle`tokio::runtime::context::BlockingRegionGuard::block_on::h428d4a9cc4bb21af(self=0x000000016fdf6c50, f={async_block_env#0} @ 0x000000016fdf6c68) at context.rs:255:13
frame #19: 0x00000001000c269c cargo-shuttle`tokio::runtime::scheduler::multi_thread::MultiThread::block_on::he9e2d355f3c74c46(self=0x000000016fdfd680, handle=0x000000016fdfd668, future={async_block_env#0} @ 0x000000016fdf9710) at mod.rs:66:9
frame #20: 0x000000010008c1a8 cargo-shuttle`tokio::runtime::runtime::Runtime::block_on::h6bd7ac9891e754bb(self=0x000000016fdfd658, future={async_block_env#0} @ 0x000000016fdfd7b8) at runtime.rs:281:45
frame #21: 0x00000001000c2544 cargo-shuttle`cargo_shuttle::main::h947536bd3b729aca at main.rs:17:5
frame #22: 0x00000001000461f4 cargo-shuttle`core::ops::function::FnOnce::call_once::h66f29fa036c63bee((null)=(cargo-shuttle`cargo_shuttle::main::h947536bd3b729aca at main.rs:6), (null)=<unavailable>) at function.rs:507:5
frame #23: 0x00000001000576d0 cargo-shuttle`std::sys_common::backtrace::__rust_begin_short_backtrace::hfee08ea92bea8ce1(f=(cargo-shuttle`cargo_shuttle::main::h947536bd3b729aca at main.rs:6)) at backtrace.rs:121:18
frame #24: 0x0000000100043910 cargo-shuttle`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h688bc618c1ea3f44 at rt.rs:166:18
frame #25: 0x0000000102126764 cargo-shuttle`std::rt::lang_start_internal::h00a235e820a7f01c [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::ha1c2447b9b665e13 at function.rs:606:13 [opt]
frame #26: 0x000000010212675c cargo-shuttle`std::rt::lang_start_internal::h00a235e820a7f01c [inlined] std::panicking::try::do_call::ha57d6d1e9532dc1f at panicking.rs:483:40 [opt]
frame #27: 0x000000010212675c cargo-shuttle`std::rt::lang_start_internal::h00a235e820a7f01c [inlined] std::panicking::try::hca0526f287961ecd at panicking.rs:447:19 [opt]
frame #28: 0x000000010212675c cargo-shuttle`std::rt::lang_start_internal::h00a235e820a7f01c [inlined] std::panic::catch_unwind::hdcaa7fa896e0496a at panic.rs:137:14 [opt]
frame #29: 0x000000010212675c cargo-shuttle`std::rt::lang_start_internal::h00a235e820a7f01c [inlined] std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::h142ec071d3766871 at rt.rs:148:48 [opt]
frame #30: 0x000000010212675c cargo-shuttle`std::rt::lang_start_internal::h00a235e820a7f01c [inlined] std::panicking::try::do_call::h95f5e55d6f048978 at panicking.rs:483:40 [opt]
frame #31: 0x000000010212675c cargo-shuttle`std::rt::lang_start_internal::h00a235e820a7f01c [inlined] std::panicking::try::h0fa00e2f7b4a5c64 at panicking.rs:447:19 [opt]
frame #32: 0x000000010212675c cargo-shuttle`std::rt::lang_start_internal::h00a235e820a7f01c [inlined] std::panic::catch_unwind::h1765f149814d4d3e at panic.rs:137:14 [opt]
frame #33: 0x000000010212675c cargo-shuttle`std::rt::lang_start_internal::h00a235e820a7f01c at rt.rs:148:20 [opt]
frame #34: 0x00000001000438dc cargo-shuttle`std::rt::lang_start::hca7ae022b7a49b1a(main=(cargo-shuttle`cargo_shuttle::main::h947536bd3b729aca at main.rs:6), argc=2, argv=0x000000016fdff1e0, sigpipe='\0') at rt.rs:165:17
frame #35: 0x00000001000c25ec cargo-shuttle`main + 36
frame #36: 0x00000001b025be50 dyld`start + 2544
The culprit is this block of code:
I am not sure what's the root cause but I'll follow up once I understand better its role.
Thanks for digging deep on this @iulianbarbu! :pray:
I am not sure why the binary size is different, no, as far as I know cargo install ...
is equivalent to cargo build --release
.
Yes, the culprit is indeed there, to load dynamic libraries (user projects) we need to use an unsafe FFI. With rust's safety guarantees, it should not be possible for this kind of error to happen in safe code. I linked this issue earlier, where the issue was misalignment due to different versions of tokio across the FFI. The question is, why is it happening now, and why is it only (or at least the vast majority of cases) on MacOS?
Did you try my suggestion from above, by the way? I was able to run actix (local runs with actix have consistenly segfaulted on mac) locally on an x86_64 mac with that. I download the mac binary release (I used 0.9.0, but 0.10.0 should be the same), which is compiled with rustc 1.65, then I build the project I want to run with 1.65 (cargo +1.65 build
), and then I cargo shuttle run
. This worked the last time I tested it, but I arrived at it by experimentation, I have limited experience at this level. My theory is that the misalignment stems from using a binary compiled with a different rustc than what the project to run is compiled with. I don't know why it only happens on MacOs. I may also be completely misguided here, so please, feel free to debunk my theory! :smile:
Again, thank you for looking into this, we would love to get this fixed!
I did some more digging, and tokio makes no guarantees about its layout staying the same across FFI even if it's the same version. It seems to work consistently in a controlled environment like our deployers, but we may need to refactor the local runs. :shrug:
This should be resolved with the release of 0.12 :partying_face: Feel free to re-open if you still experience this.
Whenever I try to run my project locally, it ends up with a zbus error; I was however able to deploy it successfully. I do believe this issue is
cargo shuttle
related as the compilation seems to work ok but it only crashes when the service tries to start. If I add a 'main.rs' module where I manually spawn axum, it works fine.Some more info about my config:
OS: Mac OS Ventura (13.0.1) -- arm64 Cargo: cargo 1.67.0 (8ecd4f20a 2023-01-10)
And here is a log of how it goes: