snirk-lang / protosnirk

The beginnings of a programming language
MIT License
1 stars 1 forks source link

Ownership patterns in the compiler #25

Open SnirkImmington opened 7 years ago

SnirkImmington commented 7 years ago

One of the technical debts of the Rust compiler is that its code does not take full advantage of Rust's ownership rules. This is partially through hacking over time but mostly due to being bootstrapped so early in Rust's development (it used @mut pointers way back when). There are a couple of ways they do ownership (Rc<T>, P<T>) with varying degrees of unsafety. The truth is simple: the Rust compiler is not written in "safe Rust". Obviously, this is very difficult (why create protosnirk if everything can be expressed in safe Rust) but I think it's possible to do better, most of the time. Obviously, hackiness eventually needs to be introduced as the language requires (not everything can be a perfect set of passes) but here are the basics:

String interning

Tokens

Tokenizing

AST

In the next stages of compiling, we need to run checks in passes. Each check should add some extra set of data to the program. Many compilers do this through lowering passes, and produce new semantic trees or graphs of their represented program. However, I think we can take a relational approach: do things in more passes but have easier lifetime analysis. Produce multiple mappings instead of lowering data down.

Item collection

We need a first pass to correlate function names together so they can be called before being declared.

Name check

The first pass (currently parse::verify::smybol_checker) creates a mapping ScopeIndex -> Symbol that simply verifies that each item and variable declared actually exists. This can happen before an "imports" pass.

~~Mutation checks are built into this right now (by changing Cell<bool> fields in Symbol). Moving it into its own check might simplify things but would result in more code and another pass.~~ This has been changed in the typeck branch.

Type check

Type inference in the typeck branch uses the same ScopedIds as variable identification. Types are resolved in a graph to unique nodes which are mapped to concrete types.

Compile

Compiling is the fun part where each of these tables is looked up from each other

Errors

SnirkImmington commented 6 years ago

I've updated this old issue given my progress in the typeck branch, but I think I may have to close this and reopen smaller/more organized items. String interning may be an important feature to add later when performance improvements are needed.

Timidger commented 6 years ago

Another solution is to bootstrap as late as possible.

Is there some solution to bootstrapping (other than being a good means of dogfooding the language) that I'm unaware of?

Go didn't bootstrap until 1.4 IIRC.

SnirkImmington commented 6 years ago

I had assumed the compiler would be more efficient with string interning (ASTs can have a lot of repeated identifiers), but looking around at other compilers I don't see a lot of string interning. It'll certainly never compare to time and space requirements of running LLVM.

Most of the things in this issue that are actually useful comments are things I'm doing in the typeck branch, and everything else is too long-term and vague to bother about. I may feel the need to rewrite the AST implementation in the future (possibly when user-defined structures and generics are taking shape), and I think a lowering pass will be required for lifetime handling and optimizations, but those aren't worth creating GitHub issues over right now.

I'm not concerned about bootstrapping - it can wait. It certainly couldn't happen anytime soon.