munificent / craftinginterpreters

Repository for the book "Crafting Interpreters"
http://www.craftinginterpreters.com/
Other
8.84k stars 1.04k forks source link

Interpreters vs Compilers #198

Closed ksceriath closed 6 years ago

ksceriath commented 6 years ago

Maybe this is too basic. But I'm kind of unclear of the distinction between an interpreter and a compiler.

According to the explanation provided in one of the initial chapters of jlox, interpreters and compilers are two different types of things.

A compiler : takes in a source and compiles instructions in some other format, be it machine code, byte code, or some other high level language code. An interpreter : is something which executes the instructions and produces output.

Does this mean a sort of interpreter is always needed in the end (whether we run the raw source code or a code compiled from some source)? In that way, is JVM an interpreter?

In jlox, we take the lox source code, and turn it into a syntax tree, and then execute it. We are representing this syntax tree using java objects. Isn't this like compiling the lox code into java code, before executing it? Then, the Scanner, the Parser and the Resolver are basically like a compiler? Then shouldn't jlox be in the overlapping region of the compiler-interpreter venn diagram (in the 'map of the territory' chapter), where we have languages which first compile under the hood, and then execute?

To me it seems that an interpreter goes one step further than the compiler. Whereas a compiler compiles and stops, an interpreter executes that compiled code (i.e. it internally first compiles the raw source into something that it can understand).

fsacer commented 6 years ago

My opinion: The interpreter has to represent code in memory somehow in order to execute it whether that is low level language or a higher one that is used shouldn't matter. JVM bytecode, .NET IL would be kind of a hybrid of interpreter/compiler cause it produces intermediary representation before stuff gets executed by JIT compiler. The jlox I would say is really an interpreter in terms of what it does. And in that way clox would be a compiler. Again I might be wrong.

alexito4 commented 6 years ago

I feel like these are the kind of words that are open fo reinterpretation and they end-up having a "common accepted interpretation", specially when the words are used to "sell" the advantages of some comercial products.

Without getting to deep in semantics (also I'm not an expert in the literature) I would characterise them as:

An Interpreter is the program that reads the source code and runs it immediately by itself. It usually performs a preprocessing pass, for jlox this is concerting the source code into an AST, but this is mostly to facilitate the interpretation or to perform some type of analysis before the execution. This conversion is not the end goal. In those lines you can understand clox as also an interpreter, as the goal is the same as jlox, but with the difference that it does a different preprocessing, creating bytecode, this time for optimising of the later interpretation. But still the goal is to just run the source code. Consider the intermediate representations of an interpreter as just implementation details.

In the other hand, a Compiler is a program that transforms the source code to a lower lever representation keeping (at least trying 🤣 ) the semantics of the original source code. And that's the main difference, the goal of the compiler is not to run the code, but to translate it to a lower level representation. Usually this lower level representation is machine code, meaning that you don't need an interpreter to run it (well, unless you consider your CPU and interpreter, which I guess it sort of is).

Now, as I said, the reality is more grey. But generally that's the idea. Think of what's the goal of the program: Javascript, Ruby, Python are all interpreted as you don't pass them trough a compiler first to distribute them. C, C++, Rust, Swift are all clearly compilers as they just transform the source code to machine code.

And again, intermediate representations are just implementation details. For example Swift uses SIL (Swift Intermediate Language) and the rest use LLVM IR. But things get more complicated when you add JIT (Just in time compilation) into the mix. And also, nothing is stoping you to have a mix, for example Jonathan Blow's language compiler has the ability to first transform the source code to bytecode, which the compiler itself can run allowing you to run code at compile time, but then that bytecode is lowered down to machine code to distribute the compiled binary.

Sorry for the long text but you can see how at the end reality is not that clear and I wouldn't recommend losing your mind with this. Try to see it of what's the output of the program, if it's the side effect of running the code is an interpreter, if it's a binary executable it's a compiler ¯_(ツ)_/¯

Hopefully others with more knowledge can chime in an clarify it more, but hopefully this gives you a better idea of the real world ^^

ksceriath commented 6 years ago

Thank you for your inputs.

My intention in asking this was to understand what is the typical parlance when it comes to this. When you say you are going to be writing a compile/ an interpreter, what are the things you are planning to put into it?

As far as I have understood (from the discussions), its mostly a grey area and that categorization shouldn't matter as long as the tool accomplishes the task. However, there are resources that kind of imply that there is some definite difference between an interpreter and a compiler. For instance, [http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf] discusses the approach for compiler construction specifically. Moreover, the author sort of implies that writing an interpreter is relatively easier than writing a compiler.

alexito4 commented 6 years ago

Moreover, the author sort of implies that writing an interpreter is relatively easier than writing a compiler.

That makes sense. We all just write an interpreter following this book 💪 and it wasn't that hard after all. The complication with a compiler (obviously assuming we're talking about compiling to machine code) is dealing with generation the machine instructions (what it's usually called the "backend") FOR EACH specific CPU/architecture you want to support. Instead with an interpreter you automatically get support for all the platforms that the host language (in the case of the first part of the book "Java") supports.

Nowadays things like LLVM help with that part so it may be a little more accessible but I would say it's still harder.

iamsubhranil commented 6 years ago

You can say when you're writing an interpreter, you're basically writing a compiler for your virtual machine. Among all the things that a compiler does, an interpreter does the same for the virtual machine, additionally runs it immediately in that machine. Moreover, in an interpreter, you're selling both the "compiler" and the "machine" for which it compiles at the same time, as the machine very tightly depends on how the "compiler" for it digests the source code. I think one of the reasons why JVM has "compiler"s out there but not "interpreters" because it separates the "machine" from the "compiler" part. Together, they still are an interpreter (with some real JIT thing). But nobody reinvents the virtual machine anymore. People got their own syntax and they digest it down the binary representation that the JVM understands, and exactly that's what compilers do, what Lox's scanner, parser and resolver does before sending it to the virtual machine to execute. Also, writing an interpreter is easier because you get to decide the architechture of your machine. It's you who decides exactly how will it run, where the instructions and the data is gonna be stored, what's going to represent the state of the system. With a standard well known hardware architechture, things are not so easy because they have some long enough documentation about how things should behave according to their terms, which some outright martian will understand too. That is the result of years of research, and decades of development. Nobody's gonna scold you if your interpreter is a bit slow. But everybody is going to sue Intel or AMD if things turns out to be 5% slower than they should.

TL;DR : When you're providing the machine with the compiler for it in one program, it's an interpreter.

munificent commented 6 years ago

It's not just you. The terms are confusing. Part of this is because the way we implement programming languages has changed over time but we still use the same words, sometimes in new ways or in ways the people who coined the words didn't consider.

This section is basically my best attempt at defining "compiler" and "interpreter" in ways that I think make the most logical sense to most of the programming language people I know. It's not perfect, but I hope most people who work in languages wouldn't disagree with it.

To your more specific points:

Does this mean a sort of interpreter is always needed in the end (whether we run the raw source code or a code compiled from some source)?

You can think of an interpreter as a program that takes another program in some sort of representation (source, bytecode, whatever) and runs it. With that definition, no, you don't always need one. If your program is compiled to machine code, then the hardware can execute it directly without needing any other "interpreter-like" layer on top of it.

In that way, is JVM an interpreter?

Yes, it is. It's an interpreter for Java bytecode. The way most JVMs execute Java bytecode is by internally compiling it to machine code, but that's an implementation detail. The JVM is a program you can use to run other programs written in bytecode.

In jlox, we take the lox source code, and turn it into a syntax tree, and then execute it. We are representing this syntax tree using java objects. Isn't this like compiling the lox code into java code, before executing it?

Not really. We aren't parsing Lox source code and turning it into Java source code, we're turning it in objects floating around in memory. jlox doesn't actually know anything about how the JVM represents objects in memory or what they look like. When we talk about a "compiler", we usually mean a tool that converts the program to some very explicitly known representation, often on disc.

Also, the in memory representation that jlox parses too isn't much lower-level than the original jlox source code, so it's not really "lowering" the representation in the way that most compilers do.

If you really wanted to, you could kinda sorta say that jlox compiles to an AST. Because the boundary of what is a "compiler" is pretty fuzzy. But most people who work on languages would probably find that a strange statement. Sort of like saying a hot dog is a "sandwich".

To me it seems that an interpreter goes one step further than the compiler. Whereas a compiler compiles and stops, an interpreter executes that compiled code (i.e. it internally first compiles the raw source into something that it can understand).

Yup. If someone say " is a compiler", they usually mean a tool that doesn't run the code after compiling it. However, if they say " has a compiler", they may mean that the tool itself has a compiler internally but the surrounding tool does also execute the result.

For example, people don't usually say CPython is a compiler, but they do say it has one.

My intention in asking this was to understand what is the typical parlance when it comes to this. When you say you are going to be writing a compile/ an interpreter, what are the things you are planning to put into it?

Great question. Because, ultimately, this is a social question — how do we use these words to communicate effectively with other people?

If you tell someone, "I'm writing a compiler." without any other qualifiers, they'll probably assume you mean a tool that takes a program and compiles it all the way to machine code. That's kind of the "classic" definition of the term.

But if you say, "I'm writing a compiler that compiles OCaml to JVM bytecode", they will understand that too. Or, "I'm writing a Lua interpreter that compiles to bytecode internally."

If you say, "I'm writing a Lua interpreter" they'll assume you mean a program that can take Lua source code and execute it. They won't know how it executes it, but that's an implementation detail.

I hope that helps! If not, let me know and I'll reopen the bug. This discussion is interesting. :)