mypyc / mypyc

Compile type annotated Python to fast C extensions
1.76k stars 46 forks source link

Refactor IR and IR building #781

Open JukkaL opened 4 years ago

JukkaL commented 4 years ago

I've noticed that constructing and modifying IR is needlessly complicated. The structure of mypyc.irbuild is also still not very clean. Here are some things that I think would be nice to clean up.

We don't need to generate names in Environment. This would make it easier to perform IR transformations. We can generate names for temporaries during code generation, after the IR is final. We can still include a name hint for registers that represent local variables.

Don't maintain a symbol table here, since it depends on mypy AST concepts. I hope that we can move the symbol table to ASTConverter.

Maybe we can also drop indexes somehow. We can reconstruct most of the state from the basic blocks, which we should do when possible. This will make it way easier to modify IR, as we don't have to update things in multiple places.

Now that we are using only the new, lower-level primitives, we can drop the various prefixes and suffixes that we used to disambiguate them from the old-style primitives.

Currently there's a lot of boilerplate when generating the IR for a function. Ideas:

Remove dependencies on mypy.nodes or mypy.types in mypyc.ir and mypyc.irbuild.ll_builder, since they are supposed be independent of these.

Rename mypyc.irbuild.builder to mypyc.irbuild.converter and mypyc.irbuild.ll_builder to mypyc.irbuild.builder.

I've never liked the name LowLevelIRBuilder. Instead, we could rename it to just IRBuilder, and rename the current IRBuilder to ASTConverter, for example. It's confusing to have two classes with similar names.

After the renaming it will also become clear that parts of ASTConverter should probably live in IRBuilder instead.

LowLevelIRBuilder is quite big, and parts of IRBuilder should probably be moved there, making it even bigger. I think that we should split it into multiple modules, using module-level functions that take a LowLevelIRBuilder as the first argument instead of methods. Here are some ideas:

This assumes that there aren't too many circular dependencies between the various operations.

msullivan commented 4 years ago

"Simplify creating IR for a function" and "Simplify Environment" seem like the high leverage things here

JukkaL commented 4 years ago

Agreed with the importance of those two.

I'm no longer sure if "builder" -> "converter" is the best way do to it. An alternative would be to only maintain a core interface in the replacement of the current IRBuilder class, and move most of the AST conversion helper methods into module-level functions in a module such as ast_helpers. The remaining class could then be called IRBuild, IRConversion, IRBuildState, ASTConversion, or similar. We might remove the pass-through methods from the class, for consistency. This would increase verbosity slightly, but clarity would also arguably improve, so maybe it's a good trade-off.

Finally, I'd like us to rework AssignmentTarget. At least we should be able to move it away from mypyc.ir and only use it in irbuild.

JukkaL commented 3 years ago

I'm working on simplifying Environment.

JukkaL commented 3 years ago

We could also make LoadInt a direct subclass of Value, similar to Register. This would simplify things a bit, since we'd no longer need the find_constant_integer_registers business, and generating IR would be slightly easier. Maybe we'd also rename LoadInt to Integer, for consistency with Register.

JukkaL commented 3 years ago

The current state seems pretty good to me already, so I'm marking the remaining work as low priority.