ohmjs / ohm

A library and language for building parsers, interpreters, compilers, etc.
MIT License
5k stars 217 forks source link

Do we get comments from ohm? #480

Open fwx5618177 opened 5 months ago

fwx5618177 commented 5 months ago

I try to get comments on code, but its CST looks like can't get these.

alexwarth commented 5 months ago

Comments should definitely be included in your CST. Unfortunately, without seeing at least a snippet from your grammar, I can't tell you what's going on.

A common thing for people to do is to extend the space rule with whatever syntax they have for comments. As an example:

  space
   += comment

  comment
    = "/*" (~"*/" any)* "*/"  -- multiLine
    | "//" (~"\n" any)*       -- singleLine

Here, you'd be able to get to the comments by writing semantic actions for comment_multiLine and comment_singleLine.

Hope that helps!

fwx5618177 commented 5 months ago

Comments should definitely be included in your CST. Unfortunately, without seeing at least a snippet from your grammar, I can't tell you what's going on.

A common thing for people to do is to extend the space rule with whatever syntax they have for comments. As an example:

  space
   += comment

  comment
    = "/*" (~"*/" any)* "*/"  -- multiLine
    | "//" (~"\n" any)*       -- singleLine

Here, you'd be able to get to the comments by writing semantic actions for comment_multiLine and comment_singleLine.

Hope that helps!

Now I set this:

    ProgramItem = Struct
                | Contract
                | Primitive
                | StaticFunction
                | NativeFunction
                | ProgramImport
                | Trait
                | Constant
                | Comment

    Statement = StatementLet
              | StatementBlock
              | StatementReturn
              | StatementExpression
              | StatementAssign
              | StatementAugmentedAssign
              | StatementCondition
              | StatementWhile
              | StatementRepeat
              | StatementUntil
              | StatementTry
              | StatementForEach
              | Comment
    // Comments
  lineTerminator = "\n" | "\r\n" | "\r" | "\u2028" | "\u2029"
  Comment = MultiLineComment | SingleLineComment | SingleLineDocComment | SingleLineImportantComment
  MultiLineComment = "/*" (~"*/" any)* "*/"
  SingleLineComment = "//" ~(("!" | "/") any) (~lineTerminator any)*
  SingleLineDocComment = "///" (~lineTerminator any)*
  SingleLineImportantComment = "//!" (~lineTerminator any)*

Then set it:

semantics.addOperation<ASTNode>('resolve_program_item', {
    MultiLineComment(_open, commentText, _close) {
        return createNode({
            kind: 'multiLineComment',
            value: commentText.sourceString.trim(),
            ref: createRef(this),
        });
    },
    SingleLineComment(_open, commentText) {
        return createNode({
            kind: 'singleLineComment',
            value: commentText.sourceString,
            ref: createRef(this),
        });
    },
    SingleLineDocComment(_open, commentText) {
        return createNode({
            kind: 'singleLineDocComment',
            value: commentText.sourceString.trim(),
            ref: createRef(this),
        });
    },
    SingleLineImportantComment(_open, commentText) {
        return createNode({
            kind: 'singleLineImportantComment',
            value: commentText.sourceString.trim(),
            ref: createRef(this),
        });
    },

But finally got the result:

{
  id: 3,
  kind: 'program',
  entries: [
    {
      id: 1,
      kind: 'multiLineComment',
      value: 'This is a multi-line comment',
      ref: ASTRef {}
    },
    {
      id: 2,
      kind: 'singleLineComment',
      value: 'This is a single line comment\n' +
        '                    //! This is a single line important comment\n' +
        '                    /// This is a single line doc comment\n' +
        '            fun testFunc(a: Int): Int {\n' +
        '                let b: Int = a == 123 ? 1 : 2;\n' +
        '                return b;\n' +
        '            }',
      ref: ASTRef {}
    }
  ]
}
fwx5618177 commented 5 months ago
image
alexwarth commented 5 months ago

Please take a look at this page, which discusses the difference between syntactic and lexical rules: https://ohmjs.org/docs/syntax-reference#syntactic-lexical

Your comment rules are syntactic (their names begin w/ a capital letter) which means they're implicitly skipping spaces. I don't think that's what you want.

(The idiom that I showed you in my first message, where you extend the space rule is a good one to use for this sort of thing.)

fwx5618177 commented 5 months ago

Please take a look at this page, which discusses the difference between syntactic and lexical rules: https://ohmjs.org/docs/syntax-reference#syntactic-lexical

Your comment rules are syntactic (their names begin w/ a capital letter) which means they're implicitly skipping spaces. I don't think that's what you want.

(The idiom that I showed you in my first message, where you extend the space rule is a good one to use for this sort of thing.)

Thanks buddy. Currently I write one prettier-plugin for tact.

fwx5618177 commented 5 months ago

Please take a look at this page, which discusses the difference between syntactic and lexical rules: https://ohmjs.org/docs/syntax-reference#syntactic-lexical

Your comment rules are syntactic (their names begin w/ a capital letter) which means they're implicitly skipping spaces. I don't think that's what you want.

(The idiom that I showed you in my first message, where you extend the space rule is a good one to use for this sort of thing.)

I use this:

Comment {
    space += comment
    comment = "//" (~lineTerminator any)* -- singleLine
        | "/*" (~"*/" any)* "*/"  -- multiLine
    lineTerminator = "\n" | "\r\n" | "\r" | "\u2028" | "\u2029"
}

But it still throw error:

const grammar = rawGrammar.createSemantics();
            const semantics = grammar.addOperation('extractComment', {
                comment(arg0) {
                    return arg0.sourceString;
                },
            });
            const matchResult = rawGrammar.match(`// This is a single line comment
1211
            const a = 1;
            `);
            if (matchResult.failed()) {
                console.log('Error:', matchResult.message, matchResult.shortMessage);
            }
            const comment = semantics(matchResult).extractComment();
image
rrthomas commented 2 months ago

I'm trying to process comments too, and I'm a bit baffled by the comments from @alexwarth: as far as I can tell, anything under space is skipped, as per the comment in #448; certainly, whatever I do I can't seem to get a rule for a comment that is itself part of space to trigger.

That is, comments are available in a MatchResult that you get back from grammar.match, but not in FormatterOperations which is returned by a semantics. So you can't write semantic actions for them.