Open benanders opened 8 years ago
I also just tried this on the latest nightly rustc 1.7.0-nightly (110df043b 2015-12-13)
and I have the same problem in both release and debug mode (with and without the --release
flag for cargo).
Interesting. I cannot reproduce the error on Linux and do not possess an OS X machine to test this on, so I can’t really help debugging this other than with general tips for debugging this kind of problems.
Illegal instruction in rust usually comes from the ud2
instruction which is emitted in certain cases by the compiler and where intrinsics::unreachable
was used. Illegal instruction also may be caused by lack of panic handling setup – unwinding through FFI boundary is illegal in rust (but since both caller and callee are both Rust, I can’t imagine this being a problem).
If you’re interested in tracking down and fixing the issue, please do so! Otherwise I’ll just keep the issue open for a while so other people could find this if they hit it as well (on stable or not).
Interesting places to start tracking down the issue would be a stack trace at the time of hardware fault and disassembly around the invalid insn, I guess.
Honestly I'd have no idea where to start in debugging something like this, I'm relatively inexperienced with particularly low level stuff, but I'd like to try getting the issue resolved. Can you make any sense of this backtrace from GDB?
Program received signal SIGSEGV, Segmentation fault.
0x0000000101416630 in ?? ()
(gdb) bt
#0 0x0000000101416630 in ?? ()
#1 0x00007fff82e18155 in tlv_finalize () from /usr/lib/system/libdyld.dylib
#2 0x00007fff818fe768 in exit () from /usr/lib/system/libsystem_c.dylib
#3 0x00007fff82e185b4 in start () from /usr/lib/system/libdyld.dylib
#4 0x00007fff82e185ad in start () from /usr/lib/system/libdyld.dylib
#5 0x0000000000000000 in ?? ()
The disassembly from the function above the ?? in the stack trace (not sure if this is useful):
(gdb) up
#1 0x00007fff82e18155 in tlv_finalize () from /usr/lib/system/libdyld.dylib
(gdb) disas
Dump of assembler code for function tlv_finalize:
0x00007fff82e18124 <+0>: push %rbp
0x00007fff82e18125 <+1>: mov %rsp,%rbp
0x00007fff82e18128 <+4>: push %r15
0x00007fff82e1812a <+6>: push %r14
0x00007fff82e1812c <+8>: push %rbx
0x00007fff82e1812d <+9>: push %rax
0x00007fff82e1812e <+10>: mov %rdi,%r14
0x00007fff82e18131 <+13>: mov 0x4(%r14),%r15d
0x00007fff82e18135 <+17>: test %r15d,%r15d
0x00007fff82e18138 <+20>: je 0x7fff82e1815e <tlv_finalize+58>
0x00007fff82e1813a <+22>: lea -0x1(%r15),%eax
0x00007fff82e1813e <+26>: shl $0x4,%rax
0x00007fff82e18142 <+30>: lea 0x10(%rax,%r14,1),%rbx
0x00007fff82e18147 <+35>: mov -0x8(%rbx),%rax
0x00007fff82e1814b <+39>: test %rax,%rax
0x00007fff82e1814e <+42>: je 0x7fff82e18155 <tlv_finalize+49>
0x00007fff82e18150 <+44>: mov (%rbx),%rdi
0x00007fff82e18153 <+47>: callq *%rax
=> 0x00007fff82e18155 <+49>: add $0xfffffffffffffff0,%rbx
0x00007fff82e18159 <+53>: dec %r15d
0x00007fff82e1815c <+56>: jne 0x7fff82e18147 <tlv_finalize+35>
0x00007fff82e1815e <+58>: mov %r14,%rdi
0x00007fff82e18161 <+61>: add $0x8,%rsp
0x00007fff82e18165 <+65>: pop %rbx
0x00007fff82e18166 <+66>: pop %r14
0x00007fff82e18168 <+68>: pop %r15
0x00007fff82e1816a <+70>: pop %rbp
0x00007fff82e1816b <+71>: jmpq 0x7fff82e185bc
I take it since there's no ud2 instruction that that's not the problem? GDB won't let me get the disassembly for the function that's actually triggering the fault.
Hmm, at a first sight it probably has nothing to do with the implementation of this library. Rather, Rust (and all other languages’) programs have some thread local storage set up. For Rust, things like printing have some TLS set-up, and it might be a case of TLS getting corrupted for the whole program (e.g. a case similar to double-free, where rust Runtime gets unloaded twice?). I’m not sure.
If you don’t mind leaking the loaded library (i.e. library you load is used more than once, perhaps, for the duration of the whole program), I can suggest you forgetting the library so it doesn’t execute these cleanups. That should at least avoid the issue.
Yeah that seems like the best option so far. I'm not rapidly opening and closing libraries where resource management is important, so leaking is the easiest way out. I hadn't seen mem::forget
, thanks for that!
According to @alexcrichton, it is very likely to be a case of the library registering some TLS destructors with pthreads, but they’re executed only when the thread itself finishes, rather than when the library is unloaded, thus resulting in us executing code that does not exist anymore. Apparently, there have been cases in a past where this has been encountered as well.
In this case, I’d say this is a bug in OS X itself (or its libdyld/pthreads) with suggested fix to “forget” the loaded library. Note, that not using any TLS related features (this includes anything related to stdio in Rust) would also avoid this bug.
What's the status on this? We would look to use this library and OS X support is required. This bug is a major blocker. A couple specific questions:
Running into this issue also so wondering the same thing if this is being tracked else where?
As far as I know, no other issues have been filed. Last time I check, there are no other libraries for Rust which serve the same purpose as this one. As for a workaround, I don't believe one is being worked on, and I unfortunately don't have the time, knowledge, or experience to try and fix this myself. I think we're out of luck at the moment :(
As far as who should be responsible for the bug, I'm not entirely sure. It might be a bug in Rust itself, because it doesn't seem to be specific to this library. But I'm not sure how willing the Rust maintainers would be to attempt to go about fixing it, since it involves the use of unsafe code and a native C library.
I’m not aware of any issues reported in other projects, nor I am aware of a public OS X issue tracker of any sort where such an issue could be reported/searched for. That being said, I do admit I didn’t look very hard for either one.
@calebmer sorry for the late response! Your comment completely fell through the cracks! Your’s are all very good questions thus I’ll try to answer them extensively:
If this is a bug in someone else's code, have the appropriate issues been filed? If so, are there any links to those issues?
No. No upstream bugs have been filled, primarily because I’m not very familiar with the OS X community or the issue reporting process. Last time I checked it needed one to pay 100 USD upfront even to report an issue in Apple’s own OS.
If there are workarounds (as you mention) are there specific examples of code that doesn't work vs code that does?
Two workarounds are:
mem::forget(library)
after the necessary symbols are retrieved), as mentioned previously;#[no_std]
) would also make this easier.Is there any progress on code being added to the library to workaround this bug?
I’m not sure it is possible to resolve this issue from in this library properly. An option would be to leak all the opened libraries by default on OS X, but I wouldn’t consider that a viable option.
Are there any libraries besides this one that serve the same function and don't have troubles with OS X?
You could certainly use barebones dlopen
and dlsym
and dlclose
, but you would almost certainly hit the same issue as with this library. Avoiding dlopen
would involve writing a whole dynamic linker for the platform of your choice by yourself.
@GravityScore you said
since it involves the use of unsafe code and a native C library
What do you mean? Rust’s standard library on OS X is strongly tied to the standard libc and contains a big amount of unsafe code. If using some additional unsafe code in the standard library would avoid the issue, I think the fix would be gladly accepted; though, I don’t think it would solve the issue in general: one could still produce a library which could use TLS in a way which would expose this issue regardless of what’s done in the Rust compiler or the standard library.
I have a question here (this is somewhat generic to Rust but bare with me) So I keep track of Library
with in a struct here https://github.com/emoon/dynamic_reload/blob/master/src/lib.rs#L44 that is the later stuffed into a Vec<Rc<Lib>>
So I wonder how I should do the forget in this case? Should I implement Drop
for the struct that holds this data and then do mem::forget
on lib
@emoon I guess the least intrusive way stable way currently would be to do something like this and then wrap your Library
into the Leak
.
struct Leak<T>(Option<T>);
impl<T> Drop for Leak<T> {
fn drop(&mut self) {
::std::mem::forget(self.0.take());
}
}
@nagisa Alright. Thanks!
@nagisa totally understand the delay, thanks for the great response! 😊
I believe this is caused by #28794, if I understand correctly it's an issue with the way the Rust compiler generates dylibs.
(I think you'd get the same crash in C if you called dlclose
, but that you wouldn't get the crash from either language if the library being loaded wasn't written in Rust.)
if I understand correctly it's an issue with the way the Rust compiler generates dylibs.
There’s nothing specific with the dylib generation, but rather with how Rust standard library implements the TLS on OS X.
but that you wouldn't get the crash from either language if the library being loaded wasn't written in Rust.)
You could use/implement TLS destructors using that function in any other language and hit exactly the same issues too.
Either way, thanks for finding and cross-referencing the issue.
No problem, thanks for the explanation!
Since 0.3 you can specify arbitrary flags when opening a library. The RTLD_NODELETE
(thanks for reminder @Np2x) essentially acts as implicit mem::forget
so you can now do something along the lines of:
let os_lib = libloading::os::unix::Library::open("fname", RTLD_NODELETE | RTLD_NOW)?;
let lib = libloading::Library::from(os_lib);
/* do your stuff */
This should still work while liberating you from having to mem::forget
your libraries :)
As per this comment, Apple has fixed this issue by implementing dynamic library unloading, if said dynamic libraries use TLS, as a no-op.
Hi, thanks for making this library, it's really useful to me.
Unfortunately, when trying out a really simple use case, I get an Illegal Hardware Instruction error. The Rust code I'm using to load the dylib is:
The dylib I'm loading contains a single function:
The dylib's Cargo.toml file contains the needed
crate-type = ["dylib"]
qualifier.I wrote some equivalent C code (loading the exact same Rust library from above), which works perfectly fine (no errors):
Any ideas why this might be happening? The illegal hardware instruction occurs after the main Rust function exits (I can place a print at the end of the
main
function and it'll be run, then the error will occur). I'm on OSX 10.11.2, using the most recent stable rust (rustc 1.5.0 (3d7cd77e4 2015-12-04)
).I narrowed it down to the Drop function on the Library struct. If I comment its contents out, then the error doesn't happen. I also replaced the Drop function with just a single call to
dlclose
like so:Which prints 0 (meaning the close function didn't return an error), which is weird.