rchain / bounties

RChain Bounty Program
MIT License
90 stars 62 forks source link

Rholang Lexer/Parser with Diagnostic API and informative errors #1015

Open golovach-ivan opened 6 years ago

golovach-ivan commented 6 years ago

RhoLP - RhoLang Lexer/Parser

Сurrent state: Interpreter/Web-Compliler with automatically generated front-end (lexer, parser) from BNFC has no diagnostic API and often generate non informative errors.

Idea: NOT replace cup/jflex interpreter front-end with hand-written but in case of an error from cup/jflex front-end - additionally run handmade lexer/parser (not full interpreter, only front-end) for informative erros.

This bounties issue created for development epic (RHOL-1027) = RHOL-1029 + RHOL-1030 + RHOL-1031.

Project RhoLP sources.

Part I: Lexer (36 codepoints)

Part II: Parser

TBD

Benefit to RChain

1. Interpreter, Web-Compliter will be more user friendly in error situations 2. This hand made lexer/parser can resolve next issues

Example/Demo

import net.golovach.rholp.*;
import net.golovach.rholp.log.*;
import java.util.List;

public class Demo {
    public static void main(String[] args) {
        String content =
                "type T = Functor[({ type λ[α] = Map[Int, α] })#λ]";
        DiagnosticListener listener = new DiagnosticCollapsedPrinter();
        RhoLexer lexer = new RhoLexer(content, listener);
        List<RhoTokenType> tokens = lexer.scanAll();
    }
}
NOTE
  Error code: lexer.note.identifier-like-absent-keyword
  Message: identifier 'type' like absent keyword, may cause confusion
  Line/Column: [1, 1]
  ----------
  type T = Functor[({ type λ[α] = Map[Int, α] })#λ]
  ^^^^

ERROR
  Error code: lexer.err.non-existent.unicode.identifiers
  Messages:
    there is no Unicode support: 'λ', codepoint = 955, char[] = '\u03BB'
    there is no Unicode support: 'α', codepoint = 945, char[] = '\u03B1'
  Line/Column: [1, 26], [1, 28], [1, 42], [1, 48]
  ----------
  type T = Functor[({ type λ[α] = Map[Int, α] })#λ]
                           ^ ^             ^     ^ 
ERROR
  Error code: lexer.err.non-existent.operator
  Message:    there is no operator '#'
  Line/Column: [1, 47]
  ----------
  type T = Functor[({ type λ[α] = Map[Int, α] })#λ]
                                                ^ 

Budget and Objective

Estimated Budget of Task: $[5400] for Part I (Lexer) Estimated Timeline Required to Complete the Task: [3 weeks] How will we measure completion? [example: commited library ready to integrate with Interpreter+Web-Compliler]

Barkov-F commented 6 years ago

It would be great if error handling was included into cryptofex-IDE

On Thu, 1 Nov 2018 at 02:07 golovach-ivan notifications@github.com wrote:

RhoLP - RhoLang Lexer/Parser

Сurrent state: Interpreter/Web-Compliler with automatically generated front-end (lexer, parser) from BNFC has no diagnostic API and non informative errors.

Idea: NOT replace cup/jflex interpreter front-end with hand-written but in case of an error from cup/jflex front-end - additionally run handmade lexer/parser (not full interpreter, only front-end) for informative erros.

This bounties issue created for development epic (RHOL-1027) https://rchain.atlassian.net/browse/RHOL-1027 = RHOL-1029 https://rchain.atlassian.net/browse/RHOL-1029 + RHOL-1030 https://rchain.atlassian.net/browse/RHOL-1030 + RHOL-1031 https://rchain.atlassian.net/browse/RHOL-1031.

Project RhoLP sources https://github.com/golovach-ivan/RhoLP/. Part I: Lexer (36 codepoints)

  • Lexer sceleton: Diagnostics API (12 codepoints)
    • Standard error format, error codes
    • Error/warn messages database
    • One scan - multiple diagnostic messages
  • Non-existed literals handling (12 codepoints)
    • Int problems: too big integer literals, absent Hex/Binary format ('0xFF', '0b1010')
    • Floating-point literals: '42.42e-42f'
    • Char literals: 'A', '\uFFFF'
  • Non-existed token types (12 codepoints)
    • Absent operators: '->', '%', '&', '&&', '^', etc
    • Absent keywords: 'do', 'int', 'this', etc
    • Absent UTF support

Part II: Parser

TBD Benefit to RChain

1. Interpreter, Web-Compliter will be more user friendly in error situations 2. This hand made lexer/parser can resolve next issues

Example/Demo

import net.golovach.rholp.;import net.golovach.rholp.log.;import java.util.List; public class Demo { public static void main(String[] args) { String content = "type T = Functor[({ type λ[α] = Map[Int, α] })#λ]"; DiagnosticListener listener = new DiagnosticCollapsedPrinter(); RhoLexer lexer = new RhoLexer(content, listener); List tokens = lexer.scanAll(); } }

NOTE Error code: lexer.note.identifier-like-absent-keyword Message: identifier 'type' like absent keyword, may cause confusion Line/Column: [1, 1]

type T = Functor[({ type λ[α] = Map[Int, α] })#λ] ^^^^

ERROR Error code: lexer.err.non-existent.unicode.identifiers Messages: there is no Unicode support: 'λ', codepoint = 955, char[] = '\u03BB' there is no Unicode support: 'α', codepoint = 945, char[] = '\u03B1' Line/Column: [1, 26], [1, 28], [1, 42], [1, 48]

type T = Functor[({ type λ[α] = Map[Int, α] })#λ] ^ ^ ^ ^ ERROR Error code: lexer.err.non-existent.operator Message: there is no operator '#' Line/Column: [1, 47]

type T = Functor[({ type λ[α] = Map[Int, α] })#λ] ^

Budget and Objective

Estimated Budget of Task: $[5400] for Part I (Lexer) Estimated Timeline Required to Complete the Task: [3 weeks] How will we measure completion? [example: commited library ready to integrate with Interpreter+Web-Compliler]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rchain/bounties/issues/1015, or mute the thread https://github.com/notifications/unsubscribe-auth/ARpuHTB1QG9n39uAIvUqUY3Y8PIiuAc3ks5uqi1CgaJpZM4YFguE .

golovach-ivan commented 6 years ago

@Barkov-F Can you be code reviewer for this issue?

dckc commented 6 years ago

This looks like it could be useful stuff, but not without detailed peer review.

I have asked many times that you find collaborators, at least as far back as August 3: https://github.com/rchain/bounties/issues/836#issuecomment-410322722 Again Sep 19 and Sep 28 https://github.com/rchain/bounties/issues/945#issuecomment-422982979 @JoshOrndorff reached out Oct 3 https://github.com/rchain/bounties/issues/991#issuecomment-426726301

Integrating it with rchain.cloud looks interesting, but that wouldn't be core-dev. I think @tschoffelen would be the main point of contact there.

Also, the core-dev label is reserved for Bounties for Development work selected by Medha for the core dev team. The measure of completion is that a PR is accepted in https://github.com/rchain/rchain . (see #273 and Bounty Task Guides)

As for a budget, I don't see how to do that in the current climate; see #1012.

dckc commented 6 years ago

Oh... @allancto tells me that @KellyAtPyrofex is trying to get a relevant PR reviewed. That would qualify it for the core-dev label. Normally the PR has to get merged during the pay period, and October is over. But maybe it could work out.

JoshOrndorff commented 6 years ago

I'd really love to learn how this works. Please LMK when you can give a tour. I've tried to build on my own and posted the problems I encountered on discord.

glenbraun commented 6 years ago

I think it is interesting to write a parser. It is certainly useful to have a way to get the RhoTypes protobufs for any given Rholang code (assuming that's the data model this parser would use). I would like to point out a way that you can use RChain itself to get the protobufs for any valid Rholang. @"parser"!( { new c, stdout(rho:io:stdout) in { contract c(x) = { stdout!(*x) } } }) Using a client we can listenForDataAtName "parser" and receive the protobufs for the Rholang in the curly brackets. That is, just wrap any valid Rholang in curly brackets, send it on a name and then listen for that using a client, you'll get the protobufs graph of the Rholang. For example, the code above looks like this: { "news": [ { "bindCount": 2, "p": { "receives": [ { "binds": [ { "patterns": [ { "exprs": [ { "eVarBody": { "v": { "freeVar": 0 } } } ], "connectiveUsed": true } ], "source": { "exprs": [ { "eVarBody": { "v": { "boundVar": 1 } } } ], "locallyFree": "Ag==" }, "freeCount": 1 } ], "body": { "sends": [ { "chan": { "exprs": [ { "eVarBody": { "v": { "boundVar": 1 } } } ], "locallyFree": "Ag==" }, "data": [ { "exprs": [ { "eVarBody": { "v": { "boundVar": 0 } } } ], "locallyFree": "AQ==" } ], "locallyFree": "Aw==" } ], "locallyFree": "Aw==" }, "persistent": true, "bindCount": 1, "locallyFree": "Aw==" } ], "locallyFree": "Aw==" }, "uri": [ "rho:io:stdout" ] } ] } I know we won't be able to send on public names in the future and will have to use a private name, but the concept is the same.

allancto commented 6 years ago

@dckc @KellyatPyrofex where in rchain would work best? I'm suggesting: github.com/rchain/rchain/rholang-parser. @glenbraun , @JoshOrndorff any opinions?

dckc commented 5 years ago

@golovach-ivan when last we chatted, I got the impression you were going to

Now I see this was submitted as https://github.com/rchain/rchain/pull/1898 . That PR cites a JIRA ticket, but not one that is part of the core dev team's plans. I don't expect the core dev team to expand their scope of work without lots of clear customer demand. Perhaps you could use rchain-community as a mechanism to explore the level of customer demand?

Until I see confirmation from @KellyatPyrofex I'm taking the core-dev label off.

cc @ArturGajowy @KentShikama