Open Chriscbr opened 1 month ago
An option would be to avoid pointers and instead use an ID to reference the data. Normally this isn't a super safe idea but luckily compilers have very static(-ish) lifetimes.
The simplest approach would be some arena built around storing items in a list. See id-arena for a nice example of that. It's both very simple and fast, although it still has the downside that individual elements can't be freed. That's where things get complicated.
Instead, worth looking into https://github.com/orlp/slotmap. Not quite as simple but you get this incredibly useful result:
use slotmap::{new_key_type, SlotMap};
new_key_type! {
pub struct TypeId;
}
#[derive(Debug, Eq, PartialEq)]
pub enum Type {
String,
Number,
Option(TypeId),
}
fn main() {
// Create a new arena and keep it alive throughout compilation
let mut types = SlotMap::<TypeId, Type>::with_key();
// Allocate types
let number = types.insert(Type::Number);
let optional_number = types.insert(Type::Option(number));
// Sometime later, retrieve an `Type` from the arena
if let Type::Option(mut outer_type) = types[optional_number] {
// The types are mutable here
match &mut types[outer_type] {
Type::Number => println!("Number"),
_ => println!("Not a number"),
}
}
}
Sharing some thoughts / notes on a possible code smell in our compiler architecture.
Our compiler stores type information internally using the
Type
enum data structure. Throughout the compiler, we reference and pass around theseType
structures (as well as some other data structures, likeSymbolEnv
andNamespace
) fairly freely using raw pointers throughout the compiler. To facilitate this, we use a custom type namedUnsafeRef
:https://github.com/winglang/wing/blob/9176564a6c0dbb9197dc39d7fb392332e855d0bd/packages/%40winglang/wingc/src/type_check.rs#L57-L83
TypeRef
is defined as an alias toUnsafeRef<Type>
.The unsafe implementation of the
Deref
trait forUnsafeRef
allows anyone to dereference aTypeRef
value (e.g.*some_typeref
) to access the underlying type. This pattern is generally "unsafe" in Rust since it's possible to write code like this:In the code above, we create an UnsafeRef using a pointer to a local variable.
t
goes out of scope at the end of the function, so the pointer is no longer valid, and accessing it would be a dangling pointer dereference, causing undefined behavior.The way we avoid this class of issue today is by convention always creating
TypeRef
objects through a helper method on theTypes
singleton struct. The helper method usesBox
to store each Type object on the heap, and then stores all of theseBox
objects within aVec
field on theTypes
struct. AnUnsafeRef
object with a pointer to the value on the heap is returned. We also ensure that theTypes
singleton is always kept in scope through the entire life of the compilation or language server - we don't let it go out of scope. By pointing to objects on the heap that we never deallocate, we've guaranteed that all of these raw pointers will always continue to be valid.The downsides of this approach, besides using
unsafe
, include:Array<str>
modeled in a user program. Interning would allow these to be de-duplicated.)Pragmatically, this isn't a pressing issue for the compiler today, but it's worth keeping an eye on.
One alternative approach could be to use
Rc<Type>
instead ofUnsafeRef<Type>
.Rc
reference counts usages of the pointer so that the underlying Type would be automatically deallocated when it reaches 0. But an obstacle implementing this today is that we have many places throughout the compiler today where it's expected to be able to mutate the internal structures referred to byType
(seeas_class_mut
,as_struct_mut
, andupdate_known_inferences
as examples).Rc
only makes its contents readable, so we'd have to find ways to avoid these mutations. (For example, perhaps a unique StructId is stored inside Type, and the actual contents with a list of mutable fields is stored in a separate side table).