Open tekknolagi opened 7 years ago
Kartik's idea: subset of x86. Ideally just from the 1-byte opcodes at https://net.cs.uni-bonn.de/fileadmin/user_upload/plohmann/x86_opcode_structure_and_instruction_overview.pdf
Emulate the subset of x86 in the VM you mean?
Yes. That way we retain the option to generate native binaries at some later date. The subset of x86 we support can be pretty clean.
What about the memory allocation primitives, et al?
Yes, some number of things will be "virtual microcode" :) But if we minimize them we can replace them with functions. Particularly if the syntax for calling primitive instructions is identical to that for functions, like in Mu.
Since we're going to have a lot of layers, the notion of "primitive" isn't really well-posed. Instead of saying "x will be a primitive" I should be saying "x will be defined/implemented closer to layer 0 than layer 3000."
When do you want to start mapping out the layers?
Oh boy. It seems a hard problem to attack head on, though I'd be interested to see you take a stab at mapping it.
Alternative approach: design the VM, compile to the VM, then progressively lower the compiler. There's still an element of a maze in the third stage, but at least we know what to do until then.
That sounds like a good approach. So time to spec out the VM, then?
Yup! If I could just find some time.
@KCreate and I can start and you can contribute whenever you get a moment
Let me share some resources that I've been trying to read.
The x86 cheatsheet I already linked to above. I'd suggest focusing on just the first square (byte) first.
A concise x86 reference, though it may be too cryptic at first.
Intel's full programming manual. I'm actually starting to think it's worth starting there. Not end-to-end, but read the start of chapter 3 and then pick specific instructions to read about based on the 'map' of the cheatsheet.
As a first task, I started out creating a list of opcodes I'd need to build factorial:
# accepts its arg in EAX.
def factorial [
c8 00 04, 00 # enter, saving ___ bytes of space for local variables
2d 00 01 # subtract 1 from EAX
e8 ___ # recursive call
c9 # leave
]
This suggests a shortlist of opcodes to read about first: arithmetic and bitwise ops, conditionals, push/pop, enter/leave (I'm not super concerned that enter/leave are less efficient and so existing compilers rarely use them. If we're climbing up the ladder of abstractions we need every little leg up we can get.)
As you can see, I'm not super concerned with following an existing ABI or ELF format, partly because I hella hate reading docs about arbitrary things like alignment and whatnot. I think that may be the only reason to build a VM first: to separate out learning x86 from learning binutils. But feel free to use the existing 32- or 64-bit ABI. It's also (perhaps first) your project.
Not sure I understand how to read the cheat sheet (resource 1).
I think it would be kind of neat to allow for arbitrary precision arithmetic at the VM level, that way we don't have to build that back up from scratch in user-land. Thoughts?
Is there a VM that does that?
Not sure on that one. That may be a sign that it's a bad idea :P
Maybe a built-in APFloat datatype like the one LLVM has?
That sounds interesting. If it can do BigInts too I'm down.
@tekknolagi
Not sure I understand how to read the cheat sheet (resource 1).
I think it's a table of all opcodes. 0x00
to 0x05
would all be a ADD
instruction but with different semantics. Just a guess though.
I have to admit bigints and arbitrary-precision arithmetic have been on my guilty wishlist. Guilty because it feels like scope creep. But if y'all want to outvote me I won't complain too hard :D
For arbitrary precision I love unums.
@KCreate is right on the cheetsheat. The vertical axis is the first nibble of the first byte, the horizontal axis is the second nibble. So you can see why call
is e8
. To distinguish between the different add
cells you need to consult the other resources.
It took me several days too to get used to the cheatsheet. For example, I only noticed the map of the instruction format after a day or two.
I'm neither pro or contra on an AP number type. If you guys think it's worth the effort i'm on board.
I'd say let's treat numeric overflow and underflow as bugs, but deprioritize fixing these bugs until we have a working compiler and VM.
I'm updating the main comment with a proposed set of instructions.
Looks good, I would like to add that I'm not a fan of register based vms. Take a look at how the JVM does things, it's really interesting.
What do you like about the way the JVM does things?
AFAIK, all operations in the JVM happen on the stack. This makes it a lot easier to compile code to.
Easier? I would imagine it's hard to juggle arbitrary amounts of variables on the stack. I'm not personally opposed to a stack machine, though.
The JVM still has an IP, SP and FP. But instead of doing things in registers you do them on a stack, which seems a lot easier in my eyes.
Alright! Fill in the section in the original post?
Also, thoughts on https://rsms.me/sol-a-sunny-little-virtual-machine ?
I've glanced over the article a couple days ago. Looks really interesting!
Does it make sense to sketch out both VM's in a single issue? It'd be much cleaner if we have a single issue per design.
Sure, if they are both proposals! I'll move mine into a separate one then this issue can reference them.
@KCreate, how do you see a stack VM coexisting with your other proposals like using LLVM, etc.? Here's my thoughts, do they fit how you're thinking about it?
The big benefit of a stack VM is that you don't have to perform register allocation. The JVM is a stack VM because they wanted to write once and run anywhere, so they couldn't make assumptions about how many registers were available to them.
Another reason the JVM is a stack machine: programs became smaller because most operands became implicit. This was important in the mid-'90s because the internet was much slower than it is today. But it doesn't matter so much today.
It's easy to create a stack machine out of a register machine: just stop using most registers! When I put it like that, it becomes clear that not using registers takes a lot of performance off the table. JVMs were quite slow until they got JITs, mostly because they didn't use their registers, and they spilled out to memory a lot more. (Having a GC that strode all over the heap, and objects with lots of little bytes for metadata here and there -- these things exacerbated memory pressure.)
That said, it is far better to just have a stack VM than to simulate a register VM with more or less registers than the underlying host. Otherwise the hardware is spilling when the compiler thinks it's not, and it becomes a giant mess.
It's worth distinguishing between a VM and an IR. We're going to have lots of passes, so we're going to have lots of IRs. However these IRs can communicate via in-memory data structures. The only place where they go to disk is where the compiler writes bytecode to disk that the VM now has to read and execute. I think this bytecode should match the registers of the underlying machine.
Many compiler passes don't have to worry about registers or stack, and these often are closer in spirit to a stack-based than a register-based machine. I conceived of Mu as an "array-based" machine, with offsets into the stack frame mapping to variable names. Kinda like a "random-access stack". It's an oxymoron, but certainly there's no thought to registers here. This is an easier compiler target than a stack-based VM: ADD
is actually harder for a compiler to reason about than ADD x, y
, because nearby instructions added or removed can alter its semantics.
Just make sure the registers of the VM match hardware. The JVM shows that performance given up here is very hard to reclaim.
Finally: if we ever find ourselves compiling from a register VM to a stack-like or array-like VM we're likely doing something very wrong.
I'm not quite sure I understand what you mean by
I think this bytecode should match the registers of the underlying machine.
It's also not clear to me what you're suggesting now. Did I get that right that you're also in favor of a stack VM?
There's value in thinking of high-level compiler IRs as sort of like a "random-access stack" machine. However by the time we get down to actually emitting a bytecode representation of the program to disk for our VM to execute, not using registers feels like a bad idea. So my order of preference is:
a) Just use the names of registers that the host machine (presumably x86) supports. b) Use a stack VM. c) Use some other register VM with more or fewer registers than the host machine supports.
Since we're actually writing a VM and not a JIT, is there really a benefit in having registers rather than solely working on the stack? If our VM has registers, they'd be in memory anyway right?
@akkartik I don't think your proposal for (a) makes sense. Yes, sure, names carry some weight -- but when it comes down to it we are already wanting to emulate a much higher level machine than x86 is right now. Like malloc and free instructions? In the ISA? At that point I do not see much use in keeping around x86 naming conventions or register structure.
However, I see an advantage in taking a look at other VMs and stealing from those. Those people have clearly learned lessons over time and adapted their machines accordingly.
I'm trying to aim to eventually run natively. Not as a JIT, just regular AOT compilation.
Wouldn't it at that point be easier to have that as another stage following compilation to our VM?
@tekknolagi I am confused. Not using x86 registers implies we can't use the x86 encoding. Are we rethinking that as well?
Yes, we have polyfills for malloc and free that will eventually be replaced. Still worth minimizing the number of such polyfills. We should keep the job one of implementing certain functions, and avoid an additional translation layer from low level VM to native.
What do you mean by the x86 encoding? I'm also not sure what you mean by:
We should keep the job one of implementing certain functions, and avoid an additional translation layer from low level VM to native.
Guys, compiler passes are useful but they have costs. Dividing up work into lots of thin passes is useful. Piling on more work for the compiler is something to avoid. If the compiler is a taxi driver, rewriting malloc
later is asking that he make a u-turn to pick the passenger up on the right side. Having the entire compiler assume one VM and then translating to a whole different VM is like adding a third stop to the taxi route.
Others before us had to play these games because of history. We don't have to.
Pick your destination and plot the most direct route to it.
It's totally legitimate to say we don't care about a native target or x86. But let's not try to please everyone.
What do you guys think about the model V8 uses? They use native code for most things but for the more complicated stuff they call into a runtime.
You should totally check out how V8 works, super interesting!
Okay then while I think having native compilation would be cool, I really do not want to mess with binary formats, cross-platform BS, etc etc. GCC can do that reasonably well as it is. So I would be more than happy to just target a VM of our own design/choosing.
That's fine. I just want to clarify that I wasn't planning to mess with ELF format, or binutils, or cross-platform BS. Precisely for the same reasons you outlined. I wanted to focus on one single subset of x86 processor instructions, and to create our own binary format and polyfills as we feel the need. I don't think this is more work or more BS. It's certainly not a stack VM, though.
I'm trying to understand the difference between your comments at the top of this thread and now. Can you say what changed between then and now?
@KCreate I have trouble understanding how a large project like V8 or gcc works :) Can you elaborate on what sort of "complicated stuff" they call into a runtime for? I'm having trouble understanding what V8 does that's different from gcc with a standard library.
@akkartik I don't understand what you mean, then, by wanting to have the end goal be AOT.
Register VM: https://github.com/tekknolagi/stackx/issues/8
Stack VM: https://github.com/tekknolagi/stackx/issues/9