munificent / craftinginterpreters

Repository for the book "Crafting Interpreters"
http://www.craftinginterpreters.com/
Other
8.84k stars 1.04k forks source link

Section 23: Potential semantical bugs lurking in conditional expressions and for loops? #773

Closed katjonathan closed 4 years ago

katjonathan commented 4 years ago

I've noticed that for the if and while constructs, when we parse the conditional expression we are simply consuming expression. This might lead to someone writing something convoluted like this:

if (1+2)
{
        print "Doesn't this seem odd?";
}

In the above expression, the 1+2 will be treated as "truthy" but as far as semantics go, this seems a little confusing. Wouldn't it be better for the user to be forced to write a conditional expression instead of giving the user free-reign to write whatever he/she wants? One way I can think of solving this is to have a specific parsing function meant only for conditionals. That way the user is forced to write a conditional. However that does not appear in Nystrom's source code as of this writing.

A similar problem exists for for loops. Take this unclear example:

for (1+2; 3 < 4;)
{
        print "How can this work out?";
}

In this case, the initializer isn't even an assignment of any kind! The problem is that we compiled an expressionStatement instead of an explicit assignment! Of course, we haven't actually created a parsing function called assignment. Instead we have the slightly hacky canAssign being passed around from function to function. (I know the official Lua 5.0 parser has an assignment parsing function but I haven't been able to decipher that section of the project yet.) The same problem we had with while loops and if statements exists in the conditional part of the for loop. Instead of 3 < 4 we could have written 3+4 and we'd wind up with the same semantic problem.

Perhaps I'm misinterpreting all of these or not reading the source code carefully enough. If so, please forgive me for my brashness. The point I was trying to make is perhaps some of the semantics of the language should be a bit more strict.

katjonathan commented 4 years ago

As a pre-emptive note of apology: I apologize for asking so many questions. I am aware that this is my 3rd question, all written in the span of a week. I view asking questions as a necessary evil. I am the type of person who really wants to grasp material very deeply. I am the kind of person who -- when embarking on learning a project like this -- intensely wants to understand every line of code and every decision made during the process of writing that code. In other words, I want to get into the mindset of the author. The only way I know how to do this is to ask copious questions. This is how I am and how I have always been. Unfortunately this desire of mine tends to annoy others. They perceive I am asking too many questions and perhaps they are right. Unfortunately, this is the only way I know on how to grasp a project like this as deeply as I want to. To all of you checking these "issues", I hope you aren't annoyed yet.

munificent commented 4 years ago

Wouldn't it be better for the user to be forced to write a conditional expression instead of giving the user free-reign to write whatever he/she wants? One way I can think of solving this is to have a specific parsing function meant only for conditionals.

This is a great question and goes deeper than you might realize. There are two fundamental questions:

  1. To what degree should the language statically detect and prevent invalid code?
  2. When it does that, how often should it do that in the parser / grammar?

There are no hard and fast answers to either of those. Certainly, the basic dynamically-typed languages is to not detect many invalid operations statically that a language with a type system can detect. And, conversely, a static type system basically is a way to detect a lot of errors at compile time and make progress on answer 1.

But even if you take static types off the table, there are other mistakes that can be detected statically. For example:

print thisVariableIsNeverDeclared;

Here's a program that tries to evaluate and print a variable that doesn't exist. That's definitely not going work. We can definitely detect that before running any code, should we? Sometimes the answer is "yes", but it's not always clear. Consider:

var thisIsNeverTrue = false;
if (thisIsNeverTrue) {
  print thisVariableIsNeverDeclared;
}

This program is harmless even though it contains a reference to a non-existent variable, because the reference is never reached. Different languages make different choices about how flexible they are with the code they accept.

Assuming you do want to rule out some kind of wrong code, should you do so by restricting it in the grammar? For example, Lox doesn't have operator overloading, so a + expression will always return a number. A number can never be used as an if condition, so we could restrict the grammar for if conditions to prohibit that expression form.

But doing so tends to add quite a lot of complexity to the grammar (and thus parser). These are all valid and should be allowed:

if (a)
if (a and b)
if (a or b)

Now consider if Lox had a ?: like most languages have. We'd want to allow:

if (a ? b : c)

But presumably not allow:

if (a ? b + c : d)

So we need a separate rule for "conditional expression inside an if condition". And we'd want to disallow:

if (a ? (b + c) : d)

OK, so we also need a separate rule for "parenthesized expression inside an if condition".

It can get out of hand pretty quickly. So, in practice, what most languages do is keep the grammar and parser fairly simple and restrictive. They declare that if (a + b) is syntactically valid, but then add later static semantic rules that still declare it to be a static error. Grammars and parsers for full-sized languages tend to get pretty big, so this is a nice way to cut down the complexity and move it elsewhere.

We could do that for this case if we wanted to. But doing so is basically just a very limited form of static type checking (you have two types, number and maybe-not-number), and most dynamically-typed languages don't take any steps in that direction even though it means allowing some invalid programs that then fail at runtime.

Even if we didn't, it wouldn't get us very far. Unless you really did something like a static type checker, it still wouldn't catch errors like this:

var a = 1 + 2;
if (a) { ... }

I am the type of person who really wants to grasp material very deeply. I am the kind of person who -- when embarking on learning a project like this -- intensely wants to understand every line of code and every decision made during the process of writing that code. In other words, I want to get into the mindset of the author. The only way I know how to do this is to ask copious questions.

That's OK! Wanted to understand deeply is great and asking questions is an important way to do that. But the askee may have limited time to answer them all. :)