ckaran commented 5 years ago

Summary

I want to create a new crate that can be used as a global allocator, but I don't want anything within my crate using my own allocator, nor do I want anything that my crate is depending on to rely on my allocator.

Background

I'm sketching out the design for a new crate that will help analyze other crate's allocation behavior, kind of like valgrind, but for places where valgrind doesn't work. My plan is to use backtrace, counter, and dipstick within my allocator crate. So far, so good, since my allocator is only for instrumentation, I can just use the system allocator as my global allocator, and everything works as expected.

The problem comes in when someone else wants to use my crate to instrument their own code. Per the documentation for the #[global_allocator] attribute:

This attribute allows configuring the choice of global allocator. You can use this to implement a completely custom global allocator to route all default allocation requests to a custom object.

and

The #[global_allocator] can only be used once in a crate or its recursive dependencies.

At this point, I'm kind of stuck; I can't define #[global_allocator] in my own crate, as that would prevent end users from using it. When end users define it so that they can use my crate, they will force my crate to use itself in the allocation process, which could lead to recursive headaches.

Potential solutions

New attributes

Create a new attribute (#[local_allocator]?) that overrides the setting for the global allocator that gets applied to a crate and its recursive dependencies.

The main issue with this is the intersection of the crates that both my instrumentation crate and the user crate need to use; in that case, we need to either have two independent copies of the same code, or have some way of distinguishing which allocator to use when.

Binary-only distribution

Compile my instrumentation crate separately (and statically) from the user crate, forcing them to link in the binary.

This may be the best solution right now, but it may cause unforeseen toolchain issues (e.g., did the user remember to compile in debug symbols for both the user crate and my separately compiled crate?). In short, it doesn't feel very ergonomic.

Other, better solutions?

I feel like that there should be a better way than these solutions which don't break any guarantees that rust is currently providing. However, I can't think of any. Thoughts/ideas anyone?

SimonSapin commented 5 years ago

There is no such thing as a local allocator. Box::new is always the same function, whatever crate it is called from. I don’t think we really could add that concept: if you allocate a Box<T> in a crate and transfer ownership of it to a function in another crate which then drops it, what happens?

As to the dependencies of your crate, what if user code also depends on one of them?

I don’t think separate compilation would really help. Or you’d need to make sure to have two separate copies of Rust’s standard library, which might have other undesirable side-effects.

Such hacks aside, there’s only one global allocator. (That’s why it’s called global.) If you want users allocator Foo to be able to configure it a the global allocator and you want some specific code not to use allocator Foo, then you need to make that code not use the global allocator. In particular that means not using Box<T>, Vec<T>, etc. Two approaches are possible:

Use another allocator directly. For example: use std::alloc::{GlobalAlloc, System}; System.alloc(…)
Use containers that are generic over the allocator type and specify for example Box<T, System>, Vec<T, System>, etc. These don’t exist yet, this WG is about defining their design.

Now, perhaps you can redefine the problem. You mention “recursive headaches”, so it sounds like the real issue is re-entrancy. Maybe what matters is not what crate is making an allocation, but rather it’s whether you’re already in the middle of handling another allocation (on the same call stack / in the same thread) when a new allocation method is called. You could detect that situation and divert calls to the system allocators:


thread_local! {
    pub static ALLOCATING: std::cell::Cell<bool> = std::cell::Cell::new(false);
}

impl GlobalAlloc for Foo {
    fn alloc(&self, layout: Layout) -> *mut u8 {
        if ALLOCATING.get() {
            return std::alloc::System.alloc(layout)
        }
        ALLOCATING.set(true);
        struct UnsetOnExit;
        impl Drop for UnsetOnExit {
            fn drop(&mut self) { ALLOCATING.set(false) } // called even if real_alloc panics
        }
        let _unset_on_exit = UnsetOnExit;

        self.real_alloc(layout)
    }

    // Similarly other methods…
}

gnzlbg commented 5 years ago

Instead of handling re-entrancy in general like @SimonSapin suggesting, it is a bit simpler to just disable your "tracing" facilities before calling the libraries that you use, e.g., like this

#[thread_local]
static Trace: bool = true; // use UnsafeCell here
impl GlobalAlloc for Foo {
    fn alloc(...) -> ... {
        if !trace {
            // you can use a panic guard like @SimonSapin 
            // suggested if `call_libunwind` can panic, but IIRC
            // it's a C library that never panics, so YOLO:
            trace = false;
            call_libunwind();
            trace = true;
        }
        malloc(...)
    }
}

That way, call_libunwind is only called when GlobalAlloc::alloc is not called from within that call_libunwind call avoiding infinite recursion. How you synchronize the state required to enable or disable tracing is kind of up to you. If the libraries you use are single threaded, a thread local will probably do.

ckaran commented 5 years ago

@SimonSapin, those issues are precisely the ones I was talking about that make writing an allocator really, really difficult.

That said, your solution for the reentrancy issue seems like it would be a good one; I would have to test it out once I wrote my crate, but I think that it would handle all of the cases in my original comment.

[begin wild speculation] I think that it could also be adapted to solve the problems that this working group is trying to solve. What you have suggested is a two-level stack of allocators; with a little more work, it could be extended to a true stack of allocators. With such a system in place, we wouldn't need to modify any current library code; Box<>, Vec<>, etc. would just use whatever is the current allocator. [end wild speculation]

I know that there would be problems with the above approach; first off, you can't pop an allocator off the stack until all of the memory it has allocated has been deallocated, which means that reference cycles could potentially prevent popping off allocators. In addition, if you want to interleave the use of two or more allocators, your stack could grow indefinitely. But I suspect that this group could think of solutions to these problems.

@gnzlbg I see your point, and you're right that it would work in my specific case, but I think I prefer the stack-based approach. I know it would be harder to manage than your approach, but it may actually be easier to do within rust's current guarantees about stability.

glandium commented 5 years ago

This raises an interesting issue: if you have two staticlib crates, and they both have a different global_allocator, the rust compiler won't tell you anything about it, neither will your linker, but that won't work as you would like it to. It will use one of the global_allocators, whichever the linker chooses.

gnzlbg commented 5 years ago

This raises an interesting issue: if you have two staticlib crates, and they both have a different global_allocator, the rust compiler won't tell you anything about it, neither will your linker, but that won't work as you would like it to. It will use one of the global_allocators, whichever the linker chooses.

Do you have a minimum working example showing this issue ? All linkers I've worked with do error on duplicate symbols. If this isn't happening right now, we have a bug somewhere.

glandium commented 5 years ago

@gnzlbg: https://github.com/glandium/global_allocator_linkage-rs

ErichDonGubler commented 5 years ago

@glandium: Ah poop. Props for being so fast! :)

gnzlbg commented 5 years ago

@glandium awesome, thanks! I can't reproduce on MacOS X (I do get the expected linker error about duplicate symbols), but could you please open a rust-lang/rust issue about this ? I think we need to try to fix that, independently of how this issue is resolved.

nnethercote commented 2 years ago

FWIW, my dhat crate does exactly what @SimonSapin suggested. Based on that, I think this issue can be closed.

ckaran commented 2 years ago

@nnethercote For my own uses, dhat is more than sufficient¹. However, I think having a stack of allocators may still be useful, though only in very niche applications where the allocation patterns are known to undergo phase changes within the program. E.g., an application program that spends 90% of its time doing general purpose work, and 10% of its time doing soft real time work. You could use something like bumpalo or some other special purpose allocator, but that requires a user to manually 'carry' the allocator around, which may not be that convenient. Note that I haven't done any analysis to determine if this pattern is sufficiently common to warrant the work necessary to develop a stack of allocators, it was just something to consider (and probably put in a different issue, a stack of allocators design doesn't fit in this issue's topic well).

¹Actually, it's WAY more than sufficient, it's what I had originally been hoping to write but never got around to! Thank you for writing it, I now need to spend some time learning its ins and outs.

nnethercote commented 2 years ago

I'm glad to hear you like dhat :) The RFC for version 0.3 may be of interest.

rust-lang / wg-allocators

Define what `#[global_allocator]` really means #25

Summary

Background

Potential solutions

New attributes

Binary-only distribution

Other, better solutions?

rust-lang / wg-allocators

Define what `#[global_allocator]` *really* means #25

Summary

Background

Potential solutions

New attributes

Binary-only distribution

Other, better solutions?

Define what `#[global_allocator]` really means #25