Support for intrusive data structures and unmoveable types

glaebhoerl commented 9 years ago

Rust currently has very poor support for intrusive data structures: it is extremely difficult or impossible to define a safe interface for them. This is because all types in Rust are moveable, and moving invalidates any pointers into the given value. Rust's current assumption is that values only have pointers pointing out from them, and any pointers into them are known statically and have their validity enforced at compile time by borrow checker restrictions. But this fails to accomodate the case where the validity of pointers into values is maintained by runtime code, as is the case for intrusive data structures.

We would like to be able to have a type such as:

pub struct IntListNode {
    value: int,
    prev: *IntListNode,
    next: *IntListNode
}

and allow client code to deal in instances of IntListNode directly and store them wherever, such as on the stack, with e.g. the Drop impl written to patch up prev.next and next.prev. The implementation would almost certainly require unsafe code, but we would like to at least expose a safe interface. This requires that IntListNode not be implicitly movable.

We could also have a trait such as:

trait Relocate {
    fn relocate(from: &move Self, to: &out Self);
}

for explicit moves of normally nonmoveable types, using &move and &out references to avoid the paradox of passing nonmoveable types by value. But we need nonmoveable types first. (relocate() should be equivalent to a clone() followed by a drop() on the original, except likely more efficient, and clone() itself may not be available.)

An "interesting" question arising from nonmoveable types is how to acquaint them with the generics system. (Do we have a Move bound, a la Copy? Do we require it to be explicitly specified? Or explicitly waived?)

Kimundi commented 9 years ago

Its already kinda possible to create unmovable types by borrowing self, see this as an example: https://gist.github.com/Kimundi/d846f1da490ef4e9b0d9

Or here a bit more integrated solution: https://gist.github.com/Kimundi/81b9944a896052b5407f

reem commented 9 years ago

I think what is desired here is not just the ability to mark types as not moveable, but to actually control the process of moving. I was thinking that this could be accomplished with @Kimundi's scheme if we require &'a mut self for the move, since that would lock the data forever, but then you can't move if you have updated, since the update borrow also lasts forever, &c.

arielb1 commented 9 years ago

@Kimundi

Sure you can have non-movable types by not giving the user &mut references, but then you lose the benefits of ownership. What would help is user types that behave like &mut T - i.e. can be reborrowed, but don't allow moving the inner type.

I don't think a compiler-supported but user-defined Relocate trait would help – non-trivial move constructors are terrible. A Sized?-style explicit-waive Move could work, however.

reem commented 9 years ago

@arielb1 You could just have types which do non-trivial moves such as a Vec resize do a needs_move::<T>() check, which the compiler could optimize out after monomorphization.

I imagine we would introduce a new rule: types are either Copy or Move, and there is a blanket impl of Move for all non-Copy types which is just memcpy.

glaebhoerl commented 9 years ago

@arielb1, @reem Relocate wouldn't be compiler-supported (at least not in the sense of being implicitly invoked, which I'm assuming you meant), just a normal trait like Clone. The critical part is having nonmoveable types, and the Relocate trait is just something we could add once we do. (This is what I personally meant by what I wrote - of course other people may have different ideas.)

What I meant by Move was implicit movability, a la Copy. In other words Relocate is to Clone as Move is to Copy. But requiring explicit Move bounds might be onerous.

@Kimundi That's interesting and clever... I wonder if it's adequate for intrusive structures like IntListNode? But even if it currently is, I suspect that it won't be once we have new destructor semantics. I'm pretty sure doing this trick on types with destructors (as IntListNode would be) is precisely what would be forbidden - things with lifetimes you store in it must have lifetime strictly greater than the thing itself. (But maybe you could still do it with unsafe code?)

arielb1 commented 9 years ago

@reem

Non-trivial move is a nightmare to work with, because it inserts user-defined operations to rather arbitrary places. A Move bound that works like Sized would work pretty well.

glaebhoerl commented 9 years ago

It strikes me that for an intrusive doubly-linked list type like IntListNode above to work well with the type system (i.e. not lead to undefined behavior), &mut borrows of it likely also have to be ruled out. The presentation above was simplified for exposition, but suppose we had value: Cell<int> rather than value: int, and allowed traversing the list and getting &Cell<int> references to the contained values. But this is not enough for safety, because we cannot rule out the possibility that another node has been &mut borrowed, and having &mut and & references to the same thing at the same time is undefined.

This is eerily similar to the restrictions on global statics: no moving, no &mut borrowing. It's probably not a coincidence: in both cases we lack the ability to statically keep track of potential references.

These restrictions also happen to correspond to the restrictions on things which are &-borrowed... which again makes sense: we essentially have the presumption that they are always shared.

glaebhoerl commented 9 years ago

Some more thoughts:

First, the fact that we seem to want essentially the same restrictions as which an & borrow has means that Kimundi's technique is a better fit than I thought at first. But is there any way to "lock" the variable from the get-go, or to work around the fact that it can be moved beforehand, without some kind of ugly runtime hacks?

This also reveals a deeper conundrum. Shared borrow restrictions prevent moving, even via relocate(). But this is for a reason! When is it safe to relocate()? If there can exist shared references to a given IntListNode at any time without you knowing about it, by traversing from other nodes, then it is never (provably) safe.

But it seems like being able to explicitly move is a capability you would want sometimes?

I don't have a great deal of experience with intrusive data structures. Can someone who does perhaps shed some light on how this sort of thing is usually reasoned about in other languages?

pythonesque commented 9 years ago

What would help is user types that behave like &mut T - i.e. can be reborrowed, but don't allow moving the inner type.

Like this? https://gist.github.com/pythonesque/d002384c29f83f9efd09

Pretty sure NoMoveGuard is unmovable in safe code.

_Edit:_ https://gist.github.com/pythonesque/d002384c29f83f9efd09#file-intlist-rs proof of concept.

reem commented 9 years ago

@pythonesque Just a note, but line 15 is undefined behavior due to undefined struct layout. You should construct it explicitly then transmute the lifetime only.

pythonesque commented 9 years ago

@reem: The NoMoveGuard is zero sized and markers have no alignment / size requirements. Is the behavior not defined in this context?

(Also, the lifetime doesn't actually change).

reem commented 9 years ago

I'm almost 100% sure that struct layout is always undefined, i.e. even going from T -> NewType(T) through transmute is UB.

pythonesque commented 9 years ago

Okay, updated to remove UB: https://gist.github.com/pythonesque/d002384c29f83f9efd09.

_Edit_ I'm not quite sure what about this requires completely immovable types: https://gist.github.com/pythonesque/d002384c29f83f9efd09#file-intlist2-rs is basically identical and doesn't use any unsafe code. So maybe I'm approaching this incorrectly.

aidancully commented 9 years ago

For what it's worth, I've done a lot of work with movable intrusive types. Generally in these cases, I use the intrusive structure to manage ownership of its container. That is, moving the intrusive structure moves the container at the same time. For example, if I use a linked-list to pass ownership of a structure from one thread to another, then I can use an intrusive link on the list to represent ownership of the containing structure. I'm not sure if I expressed that idea clearly, but it's a hope of mine that Rust will support code like that safely.

glaebhoerl commented 9 years ago

I went with the IntListNode example here because I thought it was simpler, but it's possible that this wasn't actually such a hot idea, and that this example can actually be expressed with lifetimes as they exist, as @pythonesque shows.

The use case which actually made me start thinking about intrusive data structures and unmoveable types was "a weak reference for any type" (not just Rc): you'd have a WeakEnabled<T>, from which you can form WeakRef<T>s, which has a method something like fn get(self: &WeakRef<T>) -> Option<&T>. And it's implemented (potentially) by threading a doubly-linked list through the WeakEnabled and all of its WeakRefs; when a WeakRef is moved or dropped (or cloned), the list is patched up around it in the obvious way; each WeakRef additionally contains a nullable pointer back to the WeakEnabled; and when the WeakEnabled is dropped, its destructor traverses the list and sets all of the back-pointers in the WeakRefs to null (after which get() will return None). Which sounds like a splendid plan right up until you realize that both the WeakEnabled and the WeakRefs can be moved at any time, invalidating the links. Perhaps this example can serve as better motivation than the previous one did?

@aidancully Unfortunately I don't understand what you're trying to say, even though I'd like to. What do you mean by "the intrusive structure" and "its container"? What does it mean to use a linked list to "pass ownership"?

aidancully commented 9 years ago

@glaebhoerl I'm going to try to express the idea in C, because C has well-understood semantics in this domain... Consider the following C code:

typedef struct _Link Link;
struct _Link {
    Link *Next;
};
typedef struct _List List;
struct _List {
    Link *Head;
    Link *Tail;
};

In traditional usage, the Link type is completely useless. You might be able to use it to represent Church numerals, but no other applications immediately occur to me. On the other hand, if you consider Link to be an intrusive structure inside of a container structure, like so:

typedef struct _Container Container;
struct _Container {
    /* some fields belonging to the container */
    int Field1;
    int Field2;
    /* allow this structure to be placed on a linked-list */
    Link MLink;
};

Then it becomes possible to create linked-lists of Container using the single implementation defined for Link. (Going back to your questions, MLink is the intrusive structure in this example, and Container is its container.)

What I mean by "passing ownership" is, consider the case that one thread reads from List while another thread writes to List. If both threads have agreed that elements of List are MLink fields from Container, then it's possible for the sending thread to give ownership of a Container object to a receiving thread using code like the following:

/* on "sender" side: */
void send_to_thread2(List *ContainerList, Container *C) {
    append_to_list(ContainerList, &C->Link);
}
/* on "receiver" side: */
Container *receive_from_thread1(List *ContainerList) {
    Link *Head = pop_head_from_list(ContainerList);
    if(Head == 0)
        return 0;
    return containerof(Container, MLink, Head);
}

In Rust terms, ownership of the Container structure moved from thread 1 by appending the Link structure (directly embedded within the Container) onto a List. On the other hand, ownership of the Container is moved into thread 2 by popping this embedded Link structure off of the List. Ownership of a Container is communicated between threads through passing ownership of Mlink.

Obviously I've left out a lot of details for ensuring safety when using this sort of pattern, but I think this should express the idea more clearly than the single paragraph you referenced earlier. I also think this might be a more traditional use of intrusive structures than what you're describing. AFAICT, the Rust ownership system doesn't handle this sort of usage (representing ownership of a container via ownership of a contained field) at all, right now... I'm working on an RFC around the idea, would be happy to talk about it elsewhere. (To give a preview: build intrusive structure support off a "singleton type" construct, allowing a safe "containerof"-type operation. Still working on many details.)

Thanks

aidancully commented 9 years ago

@glaebhoerl Would boxing the WeakRef allow what you want?

glaebhoerl commented 9 years ago

@aidancully Possibly, but that's a workaround with a performance cost, and (AFAICT) will also stop working if/when we get something like DerefMove (which, independently, I believe we do want to eventually gain).

(Still digesting your previous comment.)

glaebhoerl commented 9 years ago

I think the simplest meaningful example of a type where you really want something like non-moveability and &out references might be a type of relatively-addressed (position-independent) references: neither Copy nor even bitwise moves are valid, because the value necessarily has to depend on its position in memory, and should only be cloned using code which explicitly preserves the invariant. This example is also edifyingly free of any entanglement with the idea of references-into-self.

You'd want an API something like this:

pub struct Relative<'a, T> { ptrdiff: isize }

impl<'a, T> !Move for Relative<'a, T> { }

impl<'a, T> Relative<'a, T> {
    pub fn make(from: &'a T, into: &out Relative<'a, T>) { ... }

    pub fn deref(&'a self) -> &'a T { ... } // we can't actually impl Deref :\
}

impl<'a, T> Clone for Relative<'a, T> {
    fn clone_into(&self, into: &out Self) { ... }
}

With the Clone trait being modified to something like this:

pub trait Clone {
    fn clone(&self) -> Self where Self: Move;
    fn clone_from(&mut self, source: &Self) { ... }
    fn clone_into(&self, into: &out Self) { ... }
}

(For RelativeMut, you'd also want &move and trait Relocate.)

Is there any other way to support exposing a safe interface for a type like this?

pnkfelix commented 8 years ago

I agree with @arielb1 that a "bound" analogous to ?Sized (which I try to refer to as "generalization markers", though maybe I should shorten it to "generalizers") seems like it could be a good fit here.

E.g. something like a lang-item marker trait Immobile { } and trait Mobile that all types that don't implement Immobile get by default. (Or we could allow the latter to be expressed via negative bounds -- that detail isn't important to me.)

But I have not yet attempted to encode the examples given the comment thread via such an approach; the idea hasn't gone beyond the level of "thought experiment" for me yet.

aidanhs commented 7 years ago

Related to this issue: rust-tenacious has been retired so there's now no available lint for flagging that you've misused an unmovable type.

eddyb commented 7 years ago

@aidanhs It was pretty bad at its job though, there's no way to know what happens without looking at the MIR, preferably after MIR optimizations (although they shouldn't introduce more moves).

However, doing it on the MIR may be as simple as writing a MIR visitor that checks the type of every consumed lvalue and errors if it's unmovable.

Since rust-tenacious was discontinued, access to MIR has been streamlined and nowadays it should be possible to write such a lint.

RalfJung commented 5 years ago

We now have pinning. Does that mean we can close this issue?

It seems at least it should be updated to say what is still missing.

Ekleog commented 5 years ago

I think we're still missing &out references for safely moving !Unpin things around while respecting their invariants?

RalfJung commented 5 years ago

I don't think we need &out for a safe interface to intrusive data structures.

I think what is confusing me is that the issue conflates two very different concerns: Intrusive data structures (possible with Pin and without &out), and unmoveable types (may need more than that, but we don't even have a compelling motivation justifying their complexity). The issue makes it sound like these two are the same problem, but they are not.

safely moving !Unpin things around while respecting their invariants?

There is no such thing. Their invariant explicitly says "though shall not move me around".

glaebhoerl commented 5 years ago

@RalfJung The reason it conflates them is that back in 2014 the issue was not well understood (at least by me).

There is no such thing. Their invariant explicitly says "though shall not move me around".

Possibly what @Ekleog meant to express is "transfer ownership without physically moving (and without incurring heap allocation)"?

(Side note &out is unfortunately widely used to mean both one thing and its direct opposite, making it worse than useless for purposes of discussion (in particular I can only guess which of them was meant now) -- less ambiguous names might be &own resp. &uninit/&needsinit.)

RalfJung commented 5 years ago

The reason it conflates them is that back in 2014 the issue was not well understood (at least by me).

Agreed. That's why I was suggesting in the light of what we know now, the issue title + description needs updating, or the issue should be closed (and potentially a new one opened for what is still deemed missing).

glaebhoerl commented 5 years ago

(For the record the reason I didn't express any opinion or preference on that is because I don't have one.)

Ekleog commented 5 years ago

safely moving !Unpin things around while respecting their invariants?

There is no such thing. Their invariant explicitly says "though shall not move me around".

Well, if the !Unpin thing does implement a ReallyMove trait that has a function like (pseudo-syntax and just for the idea, I haven't thought about the exact type esp. of out):

fn really_move(self: Pin<&mut Self>, out: Pin<&needsinit Self>)

Then the thing could be safely moved.

I can imagine this be useful for eg. stack-linked-lists, where you would want to move a node, but then you need to re-attach the now-dangling links. The really_move function would allow to do this moving-and-reattaching behaviour.

rpjohnst commented 5 years ago

Moving and reattaching might be better done via a method that unlinks the node and un-Pins it, and then another method that inserts it.

I'm not sure, though, how widely applicable that pattern would be among "pinned objects that want to move," or even if it's allowed by Pin's guarantees (anything that sees the Pin can assume it will be dropped before being freed/moved, but I suspect privacy is enough to make this sound).

rust-lang / rfcs

Support for intrusive data structures and unmoveable types #417