tekknolagi / stackx

MIT License
2 stars 0 forks source link

Spec out or pick a target VM #6

Open tekknolagi opened 7 years ago

tekknolagi commented 7 years ago

Register VM: https://github.com/tekknolagi/stackx/issues/8

Stack VM: https://github.com/tekknolagi/stackx/issues/9

akkartik commented 7 years ago

Kartik's idea: subset of x86. Ideally just from the 1-byte opcodes at https://net.cs.uni-bonn.de/fileadmin/user_upload/plohmann/x86_opcode_structure_and_instruction_overview.pdf

tekknolagi commented 7 years ago

Emulate the subset of x86 in the VM you mean?

akkartik commented 7 years ago

Yes. That way we retain the option to generate native binaries at some later date. The subset of x86 we support can be pretty clean.

tekknolagi commented 7 years ago

What about the memory allocation primitives, et al?

akkartik commented 7 years ago

Yes, some number of things will be "virtual microcode" :) But if we minimize them we can replace them with functions. Particularly if the syntax for calling primitive instructions is identical to that for functions, like in Mu.

akkartik commented 7 years ago

Since we're going to have a lot of layers, the notion of "primitive" isn't really well-posed. Instead of saying "x will be a primitive" I should be saying "x will be defined/implemented closer to layer 0 than layer 3000."

tekknolagi commented 7 years ago

When do you want to start mapping out the layers?

akkartik commented 7 years ago

Oh boy. It seems a hard problem to attack head on, though I'd be interested to see you take a stab at mapping it.

Alternative approach: design the VM, compile to the VM, then progressively lower the compiler. There's still an element of a maze in the third stage, but at least we know what to do until then.

tekknolagi commented 7 years ago

That sounds like a good approach. So time to spec out the VM, then?

akkartik commented 7 years ago

Yup! If I could just find some time.

tekknolagi commented 7 years ago

@KCreate and I can start and you can contribute whenever you get a moment

akkartik commented 7 years ago

Let me share some resources that I've been trying to read.

  1. The x86 cheatsheet I already linked to above. I'd suggest focusing on just the first square (byte) first.

  2. A concise x86 reference, though it may be too cryptic at first.

  3. Intel's full programming manual. I'm actually starting to think it's worth starting there. Not end-to-end, but read the start of chapter 3 and then pick specific instructions to read about based on the 'map' of the cheatsheet.

As a first task, I started out creating a list of opcodes I'd need to build factorial:

# accepts its arg in EAX.
def factorial [
  c8 00 04, 00  # enter, saving ___ bytes of space for local variables

  2d 00 01  # subtract 1 from EAX
  e8  ___  # recursive call
  c9  # leave
]

This suggests a shortlist of opcodes to read about first: arithmetic and bitwise ops, conditionals, push/pop, enter/leave (I'm not super concerned that enter/leave are less efficient and so existing compilers rarely use them. If we're climbing up the ladder of abstractions we need every little leg up we can get.)

As you can see, I'm not super concerned with following an existing ABI or ELF format, partly because I hella hate reading docs about arbitrary things like alignment and whatnot. I think that may be the only reason to build a VM first: to separate out learning x86 from learning binutils. But feel free to use the existing 32- or 64-bit ABI. It's also (perhaps first) your project.

tekknolagi commented 7 years ago

Not sure I understand how to read the cheat sheet (resource 1).

tekknolagi commented 7 years ago

I think it would be kind of neat to allow for arbitrary precision arithmetic at the VM level, that way we don't have to build that back up from scratch in user-land. Thoughts?

KCreate commented 7 years ago

Is there a VM that does that?

tekknolagi commented 7 years ago

Not sure on that one. That may be a sign that it's a bad idea :P

KCreate commented 7 years ago

Maybe a built-in APFloat datatype like the one LLVM has?

tekknolagi commented 7 years ago

That sounds interesting. If it can do BigInts too I'm down.

KCreate commented 7 years ago

@tekknolagi

Not sure I understand how to read the cheat sheet (resource 1).

I think it's a table of all opcodes. 0x00 to 0x05 would all be a ADD instruction but with different semantics. Just a guess though.

akkartik commented 7 years ago

I have to admit bigints and arbitrary-precision arithmetic have been on my guilty wishlist. Guilty because it feels like scope creep. But if y'all want to outvote me I won't complain too hard :D

For arbitrary precision I love unums.

akkartik commented 7 years ago

@KCreate is right on the cheetsheat. The vertical axis is the first nibble of the first byte, the horizontal axis is the second nibble. So you can see why call is e8. To distinguish between the different add cells you need to consult the other resources.

It took me several days too to get used to the cheatsheet. For example, I only noticed the map of the instruction format after a day or two.

KCreate commented 7 years ago

I'm neither pro or contra on an AP number type. If you guys think it's worth the effort i'm on board.

akkartik commented 7 years ago

I'd say let's treat numeric overflow and underflow as bugs, but deprioritize fixing these bugs until we have a working compiler and VM.

tekknolagi commented 7 years ago

I'm updating the main comment with a proposed set of instructions.

KCreate commented 7 years ago

Looks good, I would like to add that I'm not a fan of register based vms. Take a look at how the JVM does things, it's really interesting.

tekknolagi commented 7 years ago

What do you like about the way the JVM does things?

KCreate commented 7 years ago

AFAIK, all operations in the JVM happen on the stack. This makes it a lot easier to compile code to.

tekknolagi commented 7 years ago

Easier? I would imagine it's hard to juggle arbitrary amounts of variables on the stack. I'm not personally opposed to a stack machine, though.

KCreate commented 7 years ago

The JVM still has an IP, SP and FP. But instead of doing things in registers you do them on a stack, which seems a lot easier in my eyes.

tekknolagi commented 7 years ago

Alright! Fill in the section in the original post?

tekknolagi commented 7 years ago

Also, thoughts on https://rsms.me/sol-a-sunny-little-virtual-machine ?

KCreate commented 7 years ago

I've glanced over the article a couple days ago. Looks really interesting!

KCreate commented 7 years ago

Does it make sense to sketch out both VM's in a single issue? It'd be much cleaner if we have a single issue per design.

tekknolagi commented 7 years ago

Sure, if they are both proposals! I'll move mine into a separate one then this issue can reference them.

akkartik commented 7 years ago

@KCreate, how do you see a stack VM coexisting with your other proposals like using LLVM, etc.? Here's my thoughts, do they fit how you're thinking about it?

The big benefit of a stack VM is that you don't have to perform register allocation. The JVM is a stack VM because they wanted to write once and run anywhere, so they couldn't make assumptions about how many registers were available to them.

Another reason the JVM is a stack machine: programs became smaller because most operands became implicit. This was important in the mid-'90s because the internet was much slower than it is today. But it doesn't matter so much today.

It's easy to create a stack machine out of a register machine: just stop using most registers! When I put it like that, it becomes clear that not using registers takes a lot of performance off the table. JVMs were quite slow until they got JITs, mostly because they didn't use their registers, and they spilled out to memory a lot more. (Having a GC that strode all over the heap, and objects with lots of little bytes for metadata here and there -- these things exacerbated memory pressure.)

That said, it is far better to just have a stack VM than to simulate a register VM with more or less registers than the underlying host. Otherwise the hardware is spilling when the compiler thinks it's not, and it becomes a giant mess.

It's worth distinguishing between a VM and an IR. We're going to have lots of passes, so we're going to have lots of IRs. However these IRs can communicate via in-memory data structures. The only place where they go to disk is where the compiler writes bytecode to disk that the VM now has to read and execute. I think this bytecode should match the registers of the underlying machine.

Many compiler passes don't have to worry about registers or stack, and these often are closer in spirit to a stack-based than a register-based machine. I conceived of Mu as an "array-based" machine, with offsets into the stack frame mapping to variable names. Kinda like a "random-access stack". It's an oxymoron, but certainly there's no thought to registers here. This is an easier compiler target than a stack-based VM: ADD is actually harder for a compiler to reason about than ADD x, y, because nearby instructions added or removed can alter its semantics.

Just make sure the registers of the VM match hardware. The JVM shows that performance given up here is very hard to reclaim.

Finally: if we ever find ourselves compiling from a register VM to a stack-like or array-like VM we're likely doing something very wrong.

KCreate commented 7 years ago

I'm not quite sure I understand what you mean by

I think this bytecode should match the registers of the underlying machine.

It's also not clear to me what you're suggesting now. Did I get that right that you're also in favor of a stack VM?

akkartik commented 7 years ago

There's value in thinking of high-level compiler IRs as sort of like a "random-access stack" machine. However by the time we get down to actually emitting a bytecode representation of the program to disk for our VM to execute, not using registers feels like a bad idea. So my order of preference is:

a) Just use the names of registers that the host machine (presumably x86) supports. b) Use a stack VM. c) Use some other register VM with more or fewer registers than the host machine supports.

KCreate commented 7 years ago

Since we're actually writing a VM and not a JIT, is there really a benefit in having registers rather than solely working on the stack? If our VM has registers, they'd be in memory anyway right?

tekknolagi commented 7 years ago

@akkartik I don't think your proposal for (a) makes sense. Yes, sure, names carry some weight -- but when it comes down to it we are already wanting to emulate a much higher level machine than x86 is right now. Like malloc and free instructions? In the ISA? At that point I do not see much use in keeping around x86 naming conventions or register structure.

However, I see an advantage in taking a look at other VMs and stealing from those. Those people have clearly learned lessons over time and adapted their machines accordingly.

akkartik commented 7 years ago

I'm trying to aim to eventually run natively. Not as a JIT, just regular AOT compilation.

tekknolagi commented 7 years ago

Wouldn't it at that point be easier to have that as another stage following compilation to our VM?

akkartik commented 7 years ago

@tekknolagi I am confused. Not using x86 registers implies we can't use the x86 encoding. Are we rethinking that as well?

Yes, we have polyfills for malloc and free that will eventually be replaced. Still worth minimizing the number of such polyfills. We should keep the job one of implementing certain functions, and avoid an additional translation layer from low level VM to native.

tekknolagi commented 7 years ago

What do you mean by the x86 encoding? I'm also not sure what you mean by:

We should keep the job one of implementing certain functions, and avoid an additional translation layer from low level VM to native.

akkartik commented 7 years ago

Guys, compiler passes are useful but they have costs. Dividing up work into lots of thin passes is useful. Piling on more work for the compiler is something to avoid. If the compiler is a taxi driver, rewriting malloc later is asking that he make a u-turn to pick the passenger up on the right side. Having the entire compiler assume one VM and then translating to a whole different VM is like adding a third stop to the taxi route.

Others before us had to play these games because of history. We don't have to.

akkartik commented 7 years ago

Pick your destination and plot the most direct route to it.

It's totally legitimate to say we don't care about a native target or x86. But let's not try to please everyone.

KCreate commented 7 years ago

What do you guys think about the model V8 uses? They use native code for most things but for the more complicated stuff they call into a runtime.

You should totally check out how V8 works, super interesting!

tekknolagi commented 7 years ago

Okay then while I think having native compilation would be cool, I really do not want to mess with binary formats, cross-platform BS, etc etc. GCC can do that reasonably well as it is. So I would be more than happy to just target a VM of our own design/choosing.

akkartik commented 7 years ago

That's fine. I just want to clarify that I wasn't planning to mess with ELF format, or binutils, or cross-platform BS. Precisely for the same reasons you outlined. I wanted to focus on one single subset of x86 processor instructions, and to create our own binary format and polyfills as we feel the need. I don't think this is more work or more BS. It's certainly not a stack VM, though.

I'm trying to understand the difference between your comments at the top of this thread and now. Can you say what changed between then and now?

akkartik commented 7 years ago

@KCreate I have trouble understanding how a large project like V8 or gcc works :) Can you elaborate on what sort of "complicated stuff" they call into a runtime for? I'm having trouble understanding what V8 does that's different from gcc with a standard library.

tekknolagi commented 7 years ago

@akkartik I don't understand what you mean, then, by wanting to have the end goal be AOT.