nitnelave / hopper

Programming language
Other
2 stars 0 forks source link

Grammar #19

Closed nitnelave closed 7 years ago

nitnelave commented 7 years ago

Add grammar, checked by bison, so it is guaranteed not to be ambiguous. Implement lexer to go with it.


This change is Reviewable

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.8%) to 93.927% when pulling b985d44ed5338e97d510975dc5e1f6e9a2cbe9f9 on grammar into ceed9d89d3bbb5e8b5f56df5192414df4722852f on parser.

Berdes commented 7 years ago

A few comments. I didn't reviewed everything thoroughly given the size of the change and that some parts are particularly boring.


Reviewed 21 of 21 files at r1. Review status: all files reviewed at latest revision, 15 unresolved discussions.


doc/grammar.y, line 1 at r1 (raw file):

%start dummy

BTW Are we supposed to review this file? It seems really long for this :/


examples/99_bottles.gh, line 7 at r1 (raw file):

/// $n : Number of bottles.
/// $return : Pretty string.
fun toBottles(val n: Int) : String {

Isn't this function pure?


examples/99_bottles.gh, line 21 at r1 (raw file):


/// Returns the full verse for the number of bottles of beer.
fun toVerse(val n: Int) =

Isn't this function also pure?


examples/Lambdas.gh, line 15 at r1 (raw file):

                    // This is equivalent to writing
                    // map(myLambda)
                    .map({e -> apply(myLambda, e)})

I guess it's also equivalent as {e -> myLambda(e)}, right?


examples/Macros.gh, line 53 at r1 (raw file):

///   this.otherField = otherField;
/// }
data_class(class ExampleDataClass {

Shouldn't this be #data_class?


examples/Option.gh, line 11 at r1 (raw file):

// greater than 3, it returns the value of the input (wrapped in an Option), and
// otherwise, it returns None (the Option value for nothing).
fun getInt(val in : Int32) : Int32? = if (in > 3) in else None;

This function is pure.


src/error/error.h, line 70 at r1 (raw file):

                "Error type must be a subclass of Error");
  template <typename E>
  using enable_if_error = internals::enable_if_base_of<Err, E>;

A little bit unrelated to the particular PR, but why are we using template metaprogramming to enforce inheritance when inheritance is not a very recommended pattern in C++?


src/error/error.h, line 165 at r1 (raw file):

  template <typename E>
  MaybeError(  // NOLINT: explicit
      E&& value,

The NOLINT should be on this line.


src/error/error.h, line 172 at r1 (raw file):

  template <typename E>
  MaybeError(  // NOLINT: explicit
      MaybeError<E>&& other,

The NOLINT should be on this line.


src/lexer/file_reader.cc, line 6 at r1 (raw file):


namespace lexer {
auto FileReader::read_one_char() -> ErrorOr<State, LexError> {

Why using trailing return type? It's not necessary nor shorter.


src/lexer/lexer.cc, line 52 at r1 (raw file):

Lexer::Lexer(const std::string& source, SourceTag tag,
             const std::string& filename)
    : reader_{get_stream_from_source(source, tag), filename} {}

Usually, curly braces are used to construct structs (or simple types) with all the fields as arguments or to construct a container which is supposed to directly contain the arguments. For normal constructors, parenthesis are better.


src/lexer/lexer.cc, line 88 at r1 (raw file):


  // Construct a token with the last character.
  auto make_single_token = [&, this](TokenType tt) {

Why do you need to capture "this"? Please also capture the variables explicitly. Same for the following lambdas.


src/lexer/lexer.cc, line 102 at r1 (raw file):

  // corresponding token, if matching. Otherwise, return a token of type
  // default_type.
  auto with_second_char = [&, this](

Given the size of this beast, please consider extracting this as a struct with an operator().


src/lexer/lexer.cc, line 117 at r1 (raw file):

  // LEXER STARTS.

  switch (current_char()) {

This whole switch is pretty ugly and looks very error-prone. Do you think there could be a way of simplifying it a little bit using some generalization?

For example, you could have a structure with a list of prefix associated with the action to take (return token and continue, error, call function). Then you would have some piece of code that construct the "state machine" associated and runs it. This code could also verify that you define something for all the cases.


src/lexer/lexer.h, line 48 at r1 (raw file):

  // against keywords. Returns either a LOWER_CASE_IDENT or the corresponding
  // keyword token.
  ErrorOr<Token, LexError> read_lowercase_identifier();

Please add #include "lexer/lex_error.h"


Comments from Reviewable

nitnelave commented 7 years ago

Addressed the comments in the new PR, since this one is already merged.


Review status: all files reviewed at latest revision, 15 unresolved discussions.


doc/grammar.y, line 1 at r1 (raw file):

Previously, Berdes wrote…
BTW Are we supposed to review this file? It seems really long for this :/

It's documentation, it's the grammar, you can or not, it's up to you.


examples/99_bottles.gh, line 7 at r1 (raw file):

Previously, Berdes wrote…
Isn't this function pure?

It's not required to mark pure function as such, unless they're needed at compile-time


examples/99_bottles.gh, line 21 at r1 (raw file):

Previously, Berdes wrote…
Isn't this function also pure?

same


examples/Lambdas.gh, line 15 at r1 (raw file):

Previously, Berdes wrote…
I guess it's also equivalent as {e -> myLambda(e)}, right?

Added that.


examples/Macros.gh, line 53 at r1 (raw file):

Previously, Berdes wrote…
Shouldn't this be #data_class?

You're right.


examples/Option.gh, line 11 at r1 (raw file):

Previously, Berdes wrote…
This function is pure.

no need to specify


src/error/error.h, line 70 at r1 (raw file):

Previously, Berdes wrote…
A little bit unrelated to the particular PR, but why are we using template metaprogramming to enforce inheritance when inheritance is not a very recommended pattern in C++?

There's a difference between "it's overused" and "it's not very recommended". Inheritance in C++ is very recommended when you actually have a is-a relationship, and virtual functions (which is the case for errors here).


src/error/error.h, line 165 at r1 (raw file):

Previously, Berdes wrote…
The NOLINT should be on this line.

Sadly no, it's at the constructor declaration line that clang-tidy warns.


src/error/error.h, line 172 at r1 (raw file):

Previously, Berdes wrote…
The NOLINT should be on this line.

same


src/lexer/file_reader.cc, line 6 at r1 (raw file):

Previously, Berdes wrote…
Why using trailing return type? It's not necessary nor shorter.

State is resolve as FileReader::State in the trailing return type style, not in the usual one. I toyed with the idea of making it the style of the repository to have trailing return style to match grasshopper, but ended up against as it was too much of a departure from usual C++ style


src/lexer/lexer.cc, line 52 at r1 (raw file):

Previously, Berdes wrote…
Usually, curly braces are used to construct structs (or simple types) with all the fields as arguments or to construct a container which is supposed to directly contain the arguments. For normal constructors, parenthesis are better.

Although I could argue for braces almost everywhere (like I could argue for almost always auto), I'll give in for consistency. But I disagree on principle with the "parenthesis are better" statement


src/lexer/lexer.cc, line 88 at r1 (raw file):

Previously, Berdes wrote…
Why do you need to capture "this"? Please also capture the variables explicitly. Same for the following lambdas.

current should be this->current_token(), let me replace that. I don't see the point in capturing variables explicitly for lambdas, especially ones used only locally as here.


src/lexer/lexer.cc, line 102 at r1 (raw file):

Previously, Berdes wrote…
Given the size of this beast, please consider extracting this as a struct with an operator().

Done.


src/lexer/lexer.cc, line 117 at r1 (raw file):

Previously, Berdes wrote…
This whole switch is pretty ugly and looks very error-prone. Do you think there could be a way of simplifying it a little bit using some generalization? For example, you could have a structure with a list of prefix associated with the action to take (return token and continue, error, call function). Then you would have some piece of code that construct the "state machine" associated and runs it. This code could also verify that you define something for all the cases.

I'm open to refactors :) But for now, this is a well-tested piece of code, easily extensible, and I find it easily readable. Once you find the char you're interested in, you never have more than 10 lines of linear flow, compared to a state machine that would have you jump around.

Maybe you have a more elegant idea in mind.

Ideally, I would (statically or at startup) construct a trie with all fixed keywords, and just have rules for the rest, but for now it works, it's well contained, well tested, maintainable, let's move forward :)


src/lexer/lexer.h, line 48 at r1 (raw file):

Previously, Berdes wrote…
Please add #include "lexer/lex_error.h"

Done.


Comments from Reviewable