Closed expeditiousRubyist closed 8 years ago
There’s a bunch of reasons to generate the unreachable
(in LLVM, which compiles down to ud2
on x86… most of the time) instruction. Generating this instruction provides hints to the optimiser, that the instruction in question is not reachable. Not generating this instruction at all would result in slower compilation/worse optimisations. Namely:
unreachable
there enables optimisations and ability to discover bugs in user code faster (there has been cases where they’ve been discovered much faster). This is also the kind of unreachable
you see generated with panic!
. Panic macro calls a diverging function which begins unwinding and must not return;unreachable
, because the match is provably exhaustive in Rust. You can’t always tell that on the LLVM side. A valid optimisation;If you’re hitting (executing) a ud2
/unreachable
(other than invoking ::std::intrinsics::unreachable
yourself), you’ve invoked some sort of undefined behaviour in your code. It is not guaranteed unreachable
instruction will generate ud2
; Actually reaching that code is UB and might result in anything ranging from nothing happening to a null pointer dereference, launch of all the nuclears USA possesses and ejection of your CPU out of its socket all at once.
If you can prove there’s no code invoking UB in your code and the libraries you use, its a bug in rustc.
@nagisa we use unreachable in other contexts, like an OOM, that valid code can hit.
EDIT: we actually specifically use the abort intrinsic instead, which is lowered to a ud2 as well.
@nagisa You're confusing abort and unreachable and OP is complaining not because of the existence of abort but because unwinding is not implemented on windows and thus panicking causes an abort.
@sfackler alloc::oom
is an intrinsics::abort
. I find it interesting llvm.trap()
compiles to ud2
on x86, but otherwise is it the only other intrinsic that results in ud2
? Seems fixable by generating llvm.debugtrap()
instead?
Seems fixable by generating llvm.debugtrap() instead?
There is no need for a fix because nothing is broken.
@mahkoh
unwinding is not implemented on windows
32bit MSVC only, IIRC. The OP has explicitly mentioned x86-64. Both of x86-64-msvc and x86-64-gnu have implementations of unwinding.
Then there is another bug somewhere but this has nothing to do with ud2.
@mahkoh I'm not sure how I follow. An OOM calls intrinsics::abort
which lowers to ud2. A SIGILL may not provide the best user experience, which is exactly what @nagisa said.
@sfackler Then OOM should do something else. Changing abort into a breakpoint is nonsensical.
AFAIK, panic!()
does not/should not execute an ud2
except on 32-bit windows (https://github.com/rust-lang/rust/issues/25869). Could you post more details about your set-up (e.g. the output of rustc -vV
) & the code you're running, @expeditiousRubyist?
@huonw sure. The code I'd been fiddling around with was more or less this: https://gist.github.com/expeditiousRubyist/2997ba449d6ee5cb70dc
And when compiled to object code and disassembled using the following commands
rustc -C opt-level=3 --emit obj cell.rs
objdump -d -M intel cell.o
my code produced the following disassembly: https://gist.github.com/expeditiousRubyist/7f5922cf732bc36c34c0
My result of running rustc -vV
is as follows:
rustc 1.4.0 (8ab8581f6 2015-10-27)
binary: rustc
commit-hash: 8ab8581f6921bc7a8e3fa4defffd2814372dcb15
commit-date: 2015-10-27
host: x86_64-pc-windows-gnu
release: 1.4.0
Oh, were you calling cell_get_fixnum
from C (or some other non-Rust language)? If so, panicking isn't supported across language boundaries. panic!
isn't designed as a robust error handling strategy, but rather the last resort when a program cannot continue, and this is actually one of the big motivations for std::panic::recover
(nightly-only atm).
A good way to resolve this is to use some other way to handle errors, e.g. return a worked/failed boolean from the function:
pub unsafe extern fn cell_get_fixnum(cell: *mut Cell, num: *mut i32) -> i32 {
match *cell {
Cell::Fixnum(i) => { *num = i; 1 }
Cell::Double(f) => { *num = f as f64; 1 }
_ => { 0 }
}
}
(Incidentally, if you're calling things from other languages you'll need the extern
as well as pub
etc.)
Well the intention was to "print error message and stop application" upon a bad state, which is why I was using panic!, although it would seem that if its behavior is intended to be undefined when run from another program, I may just have to write my own crash() function to handle this all gracefully (even if it's not necessarily idiomatic Rust). Still, I am confused as to why Rust emits illegal instructions in the first place. Although I am not the most knowledgeable about compilers in general, I did find myself asking a professor today his opinions on when it would be appropriate for a compiler to emit such an instruction. His response was a simple "never," leaving me to question the logic behind these choices.
But anyways, thanks for the advice with regards to an alternative way to handle those errors. Although I was under the impression that extern is for importing functions written in C, and not necessarily needed for exporting functions, as I have been able to get functions to be visible and callable from C without the keyword, so long as I use #[no_mangle]
.
Still, I am confused as to why Rust emits illegal instructions in the first place. … leaving me to question the logic behind these choices.
What you see in your disassembly, namely the call 4d <_ZN10sys_common6unwind12begin_unwind21h11851751900855523014E+0x1d>
is a call to diverging function (i.e. one in form of fn begin_unwind(…) -> !
). Ergo, if implemented properly (and there’s no UB), the function should never return. Putting ud2
there is a safety switch for that invariant.
Consider what would happen, if ud2
wasn’t here. The function, which begins unwinding, would’ve returned and executed completely unrelated code! The resulting issue report would then be not about the ud2
instruction, but rather about bad code generation and would require much more investigation to discover there was some UB at work.
For now I filled https://github.com/rust-lang/rust/issues/30791 resolving which would make panic!()
properly diverging in the particular case you encountered.
Yeah, if you want certain exit behaviour writing it yourself will be better than relying on panic!
to do something appropriate (which it probably isn't e.g. the error formatting of panic!
is not a good diagnostic for people who are using the library/application rather than writing it).
Although I am not the most knowledgeable about compilers in general, I did find myself asking a professor today his opinions on when it would be appropriate for a compiler to emit such an instruction. His response was a simple "never," leaving me to question the logic behind these choices.
It's a little more subtle than never: it's a very effective way to stop execution when there's a problem, e.g. Clang/LLVM sometimes emits a ud2
when it detects undefined behaviour, and the Linux kernel uses it for its BUG
macro. Also, apparently it is something CPUs understand, and can use to avoid speculatively reading/decoding code that it doesn't need to:
[
ud
] is used as a signal to the instruction fetcher that the processor should not waste efforts decoding the following memory since the execution will continue at a different location.
Also, it might be worth noting that rustc isn't directly choosing to emit ud2
: depending on the situation is either inserted by the LLVM compiler backend automatically or comes because rustc calls the llvm.trap
intrinsic of the LLVM compiler backend, and the LLVM developers have decided ud2
is the appropriate instruction for that. (I think the main alternative for the latter would be calling the C abort
function, but this would be less appropriate in a freestanding context like writing an operating system, where libc and hence abort
doesn't automatically exist.)
Although I was under the impression that extern is for importing functions written in C, and not necessarily needed for exporting functions, as I have been able to get functions to be visible and callable from C without the keyword, so long as I use #[no_mangle].
There's two uses of the extern
keyword:
extern { ... }
blocks for importing symbols (the one you mention)extern fn
for controlling the ABI/calling convention (low-level details about how arguments are passed in registers/on the stack etc.) of a functionThe latter is what I'm talking about. Both ends of a function call foo()
need to know the calling convention: the callsite of foo()
needs to know where to put the arguments and the internals of foo
itself need to know where to look to find those arguments. By default a fn
uses the Rust calling convention which may not/is not the same as the C calling convention. The latter is what the C side of the call will assume and so one needs to ensure the Rust function matches by writing extern fn
to switch the ABI (that's a short hand for extern "C" fn
). This doesn't affect the visibility of the symbol/whether it can be called, but it can lead to some very confusing bugs, where a call in C like foo(0, 1, 2, 3)
doesn't end up with those values on the Rust side.
Thanks for filing, but I don't think there's any action to be taken here, especially now that https://github.com/rust-lang/rust/issues/30791 has been opened: usually, it's a bug in the program (or some library it is using) if a ud2
is executed, and hence Windows' response isn't entirely wrong (in some cases like https://github.com/rust-lang/rust/issues/25869, it is a compiler bug but I don't think this is one of them). As such, I'm closing, but feel free to continue the discussion, although it may be more appropriate to move to https://users.rust-lang.org/ or stack overflow. :smile:
In my experiments in programming Rust, I've noticed that the panic! macro seems to generate an instruction called ud2 on x86-64 -- an instruction that is intentionally invalid. Doing this, I presume, is to intentionally raise a SIGILL and crash the program, a practice I have not seen in any other systems programming language. This should likely be for the same reason that no one terminates a process by intentionally dereferencing a null pointer -- the OS has more graceful ways of stopping it, and nobody wants to add "segmentation fault" or "illegal instruction" (or similar output messages depending on the OS) to their program's output. On Windows, this becomes particularly annoying, as running into either of these errors will cause a window to appear for a few seconds informing the user that the program has stopped working.
Would it be unreasonable to simply print the appropriate error message and call std::process::exit()?