Open HeroicKatora opened 4 years ago
We discussed this during the libs meeting today and feel that there is a lack of a specific use case for these methods. @HeroicKatora (or anyone else) could you describe how you are using these methods in your own code?
In particular we feel that undo_leaks
is unlikely to be useful in practice: if you've got a RefCell
then you are unlikely to ever be able to get a mutable reference to it (except maybe in drop
, but then you can just use get_mut
and ignore the ref count).
To expand on my original use case, consider an embedded network stack. More
specifically, a routing table in an almost static network, that is changing
infrequently (i.e. statically configured). This table will contain addresses of
some relevant remote machines, and is initialized by state machines that
discover those remotes in the network and insert their addresses and paths to
the table. For reasons of avoiding allocation during operation, the table
itself is represented by an unsized slice type: OrdSlice<Addr>
.
Since multiple independent actors are supposed to access the table by methods
taking &mut OrdSlice<_>
, to insert their endpoints, we must employ a form of
runtime borrow checking. I had initially considered an unsafe
implementation
until noting that I would just be replicating the exact fields of RefCell
and
figuring out it could fit into its types.
Note that each insert
operation might modify the whole slice to keep it
sorted so we can't use a simple entry-based API where we split the borrow
upfront in this case. (Technically, some GhostCell variant could maybe be
used as well but at the time it wasn't nearly as established and I did not want
to have too much unsafe
proof obligations floating around). The code that
runs after the operation expects a reference to the whole table, mostly
immutable but noalias
is good for codegen and there may be modifications
during runtime, and we really did not want the details of the setup or the
RefCell
wrapping to leak through to that module. More on evaluation of
alternatives later. Simplified code sketch of the structure involved:
/// A slice of ordered values, with bisect lookup.
/// Note: builtin unsizing by choosing `U = [T; N]`.
pub struct OrdSlice<T, U: ?Sized = [T]>(PhantomData<T>, U);
mod netstack {
pub struct Runtime<'net> {
_foo: &'net mut OrdSlice<Addr>,
}
}
mod embedded_app {
pub struct FindFoo<'net> {
target: &'net RefCell<OrdSlice<Addr>>,
}
enum Network<'net> {
Setup {
target: &'net RefCell<OrdSlice<Addr>>,
with_foo_service: Option<FindFoo<'net>>,
… },
Running {
net: Runtime<'net>,
… },
}
}
There are several key reasons why RefCell
(and likewise constructs) are
avoided in the netstack
module. Firstly it is used by other applications,
some of which choose to allow dynamic allocations and thus pass around owned,
copied Vec<_>
instances instead of the more complex scheme of embedded_app
.
By not using Ref
in the stack itself we allow both usage patterns without
complex generic parameters. Also note the stability implication of using
&RefCell<T>
. What is required to freely utilize the object in this logically
distinct code module would be a cell that has no outstanding borrows. Which is
just &mut T
with extra steps, in particular depending on absence of
programmer misuse.
Actual usage (with initialization for completeness) is then as follows:
fn ingress_step<'net>(
network: &mut Network<'net>,
queues: &mut …,
…
) {
match network {
Setup { target, with_foo_service, … } => {
// some setup io..
if let Some(foo) = &with_foo_service {
foo.ingress(queues);
}
// Done with setup?
if !with_foo_service.as_ref().map_or(true, FindFoo::is_done) {
return;
}
// Switch to active service mode.
network = Network::Running {
net: RefMut::leak(target.borrow_mut()), …
};
},
Running { net } => route_packets(net, queues),
}
}
Which we can run with error recovery by utilizing undo_leak
:
// Modified and shortened obviously..
fn run(addr: &RefCell<OrdSlice<Addr>>) -> Result<()> {
let mut state = Network::new(addr);
let mut queues = …;
loop {
ingress(&mut state, &mut queues);
egress(&mut state, &mut queues);
queues.errors()?;
}
Ok(())
}
fn main() {
let addr: OrdSlice<_, [_; 16]> = Default::default();
let mut addr = RefCell::new(addr);
loop {
// Otherwise, second `run` will panic.
RefCell::undo_leak(&mut addr);
if let Err(_) = run(&addr) { … }
}
}
… and why they were not chosen.
Runtime<'net>
transiently from a &'net OrdSlice<Addr>
in each
iteration of the loop body. This sidesteps the lifetime issue by storing a
reference to the original RefCell
instead. However, this is not very clean
code as Runtime
can no longer protect any inner invariants—possibly involving
correctness checks in each iteration as part of construction.Network
, just complicates the
implementation which is duplicated a lot and is quite unreadable. Note that
this style can introduce unexpected implicit and explicit lifetime bound
requirements, adding to the effect.Network
state machine with code. Break run
into separate
loops for each of the states, introducing a Ref
in the second half, which
is borrowed for the Runtime
. This duplicates the ingress/egress portion,
error handling and forces all similar state machines to be exposed at that
level.undo_leak
we could also just exit and have the outer system deal with
error recovery and restart. This gives less control over the precise behavior
during recovery, which might be fine as I'm merely using it for logging.
We're losing some program state, however, which might also be worked around
with shared memory files. It's simply more convenient to handle it in the
program, though.Note that each insert operation might modify the whole slice to keep it sorted so we can't use a simple entry-based API where we split the borrow upfront in this case. (Technically, some GhostCell variant could maybe be used as well but at the time it wasn't nearly as established and I did not want to have too much unsafe proof obligations floating around). The code that runs after the operation expects a reference to the whole table, mostly immutable but noalias is good for codegen and there may be modifications during runtime, and we really did not want the details of the setup or the RefCell wrapping to leak through to that module.
I have a very similar situation that warrants this functionality, including the need for undo_leaks
.
I think that this comes up in any situation with the following constraints:
Ref
is too short -- e.g. in a loop where each Ref
must be dropped before the next loop iteration.My simplified example looks something like the following:
struct State {
wrapper: Wrapper,
counter: u64,
}
// Wrapper to make sure this isn't Clone / Copy.
struct DataType(u64);
struct FrameData<'sim> {
data: &'sim DataType
}
mod inner {
use std::cell::RefCell;
use super::*;
// This wrapper is needed so that nothing else can call `borrow` or `borrow_mut`.
pub struct Wrapper {
data: RefCell<DataType>
}
impl Wrapper {
pub fn new() -> Self {
Self {
data: RefCell::new(DataType(0))
}
}
pub fn simulate(&mut self, new_value: DataType) {
*self.data.borrow_mut() = new_value;
}
pub fn wrapper_frame_data<'frame, 'sim: 'frame>(&'sim self, frame_data: &mut Vec<FrameData<'frame>>) {
let guard = self.data.borrow();
let data: &DataType = &*guard;
// Transmute the data to the `sim lifetime, because we know that the state can't be modified during that lifetime.
let data: &'sim DataType = unsafe { std::mem::transmute(data) };
frame_data.push(FrameData {
data
});
}
}
}
use inner::Wrapper;
fn simulate(state: &mut State) {
state.counter += 1;
// This is the only place where the `wrapper` could be modified -- we need a &mut reference.
state.wrapper.simulate(DataType(state.counter));
}
fn state_frame_data<'frame, 'sim: 'frame>(state: &'sim State, frame_data: &mut Vec<FrameData<'frame>>) {
// `wrapper` cannot be mutated or dropped while `&'sim State` lives.
state.wrapper.wrapper_frame_data(frame_data);
}
fn render(state: &State) {
let mut frame_data = Vec::new(); // <-- This definitely lives less time than `State`.
state_frame_data(state, &mut frame_data);
for data in frame_data {
println!("{}", data.data.0);
}
}
fn main() {
let mut state = State {
wrapper: Wrapper::new(),
counter: 0,
};
loop {
simulate(&mut state);
render(&state);
}
}
Using the API, all I'd need to do is make 2 changes:
pub fn simulate(&mut self, new_value: DataType) {
RefCell::undo_leak(&mut self.data);
*self.data.get_mut() = new_value;
}
pub fn wrapper_frame_data<'frame, 'sim: 'frame>(&'sim self, frame_data: &mut Vec<FrameData<'frame>>) {
let guard = self.data.borrow();
let data: &DataType = &*guard;
// No more unsafe!
let data: &'sim DataType = Ref::leak(guard);
frame_data.push(FrameData {
data
});
}
What is the current status of this feature?
68712 adds methods to convert
RefCell
guardsRef
/RefMut
into references with the lifetime of the underlying cell.The feature gate for the issue is
#![feature(cell_leak)]
.Unresolved Questions
MutexGuard
andRwLockReadGuard
/RwLockWriteGuard
?unsafe
methods be added to forcefully revert a leak?undo_leak
, would it make more sense not to return a reference (i.e., separate this fromget_mut
)?