rome / tools

Unified developer tools for JavaScript, TypeScript, and the web
https://docs.rome.tools/
MIT License
23.76k stars 663 forks source link

☂️ JSX Parsing Support #2153

Closed xunilrj closed 2 years ago

xunilrj commented 2 years ago

See specification at: https://facebook.github.io/jsx/

Lazy Lexer

see: https://github.com/rome/tools/issues/2035

JSX Parser

Testing

ematipico commented 2 years ago

Here's the link to the spec: https://facebook.github.io/jsx/

MichaReiser commented 2 years ago

Thanks for writing this up.

Making the token source lazy is something that we probably want to do regardless if JSX depends on it because it is a blocker for fixing some of the TS conformance bugs (A< <TypeArgs>() => string> is valid, multiple regexp that aren't correctly parsed).

There are also a few places where we must change how we parse TypeScript if JSX is enabled (... arrow functions and casts).

Boshen commented 2 years ago

If we are creating a new lexer, checkout https://github.com/mozilla-spidermonkey/jsparagus/blob/master/crates/parser/src/lexer.rs for a more rusty approach 😄

MichaReiser commented 2 years ago

I start working on a proposal for the JSX AST nodes.

MichaReiser commented 2 years ago

I followed the spec linked by @ematipico

// For embedding into expressions
JsxElementExpression = 
    element: JsxAnyElement

JsxFragmentExpression = 
    fragment: JsxFragment

JsAnyExpression =   
    ...
    | JsxElementExpression
    | JsxFragmentExpression

// ==================================
// Elements
// ==================================

JsxAnyElement = 
    JsxElement 
    | JsxSelfClosingElement

// <a>...</a>
JsxElement = 
    opening_element: JsxOpeningElement // Rome classic inlines the opening and closing element. May make it hard distinguishing which token belongs where?
    children: JsxChildList 
    closing_element: JsxClosingElement

JsxOpeningElement = 
    '<'
    name: JsxAnyElementName
    type_arguments: TsTypeArguments?
    attributes: JsxAttributeList
    '>'

JsxClosingElement = 
    '</'
    name: JsxAnyElementName
    '>'

// <a />
JsxSelfClosingElement = 
    '<'
    name: JsxAnyElementName
    type_arguments: TsTypeArguments?
    attributes: JsxAttributeList
    '/>'

JsxFragment = 
    '<>'
    children: JsxChildList
    '</>'

JsxAnyElementName = 
    JsxReferenceIdentifier
    | JsxMemberExpression // (JsxMemberName?)
    | JsxNamespaceName

JsxAnyIdentifier = JsxReferenceIdentifier | JsxMemberExpression

// <a.test> 
JsxMemberExpression = 
    object: JsxAnyIdentifier
    '.'
    member: JsName

// ==================================
// Attributes
// ==================================

JsxAnyAttribute = 
    JsxSpreadAttribute
    | JsxAttribute

JsxAttribute = 
    'name': JsxAnyAttributeName
    'initializer': JsxAttributeInitializerClause?

JsxAttributeInitializerClause =
    '='
    value: JsxAnyAttributeValue

JsxAnyAttributeValue = 
    JsxElement
    | JsxSelfClosingElement
    | JsxFragment
    | JsxStringLiteral 
    | JsxExpressionAttributeValue

// <a b={expr} />
JsxExpressionAttributeValue =
    '{'
    expression: JsAnyExpression
    '}'

// <a {...b} />
//    ^^^^^^
JsxSpreadAttribute = 
    '{'
    '...'
    argument: JsAnyExpression // parse_assignment_expression_or_higher
    '}'

JsxAttributeList = JsxAnyAttribute* 

// `a:b`= or `a`
JsxAnyAttributeName = 
    JsxNamespaceName
    | JsxName

// ==================================
// Children
// ==================================

JsxAnyChild = 
    JsxText
    | JsxElement
    | JsxSelfClosingElement
    | JsxFragment
    | JsxExpressionChild
    | JsxSpreadChild

// <a>{...b}</a>
//    ^^^^^^
JsxSpreadChild = 
    '{'
    '...'
    argument: JsAnyExpression // Assignment expression or higher
    '}'

// <a>{b}</a>
//    ^^^
// <a>{}</a>
//    ^^
JsxExpressionChild = 
    '{'
    expression: JsAnyExpression?
    '}'

JsxText = value: 'jsx_text'

JsxChildList = JsxAnyChild*

// ==================================
// Auxilary
// ==================================

// has different semantic than JsReferenceIdentifier, allows for `await`
// but maybe not worth distinguishing?
JsxReferenceIdentifier = value: 'ident'

// <a:test>
JsxNamespaceName = 
    namespace: JsReferenceIdentifier
    ':'
    name: JsName

// JSX strings don't allow for escape sequences
// Historically, string characters within JSXAttributeValue and JSXText are extended to allow the presence of HTML character references to make copy-pasting between HTML and JSX easier, 
// at the cost of not supporting \ EscapeSequence of ECMAScript's StringLiteral. We may revisit this decision in the future.
JsxStringLiteral = value: 'js_string_literal'

Main differences to rome classic

Main question

ematipico commented 2 years ago

I think this comment should have been a github discussion, it would have been easier to leave feedback.

Here's some feedback:

// <a.test> 
JsxMemberExpression = 
    object: JsxAnyIdentifier
    '.'
    member: JsName

Member expressions can be recursive, meaning that we can have something like <a.b.c></a.b.c>. This mean that object: should be able to have JsxMemberExpression too.


But these are only semantic differences that aren't validated in a mutation API anyway and may, thus, not be worth it

Also names with first capital letter are exceptions, e.g. <Aside></Aside>. Personally, I would prefer a new node because it makes easier to pinpoint these nodes inside analyzers. But I don't have strong opinions :)


// <a>{b}</a>
//    ^^^
// <a>{}</a>
//    ^^
JsxExpressionChild = 
    '{'
    expression: JsAnyExpression?
    '}'

Maybe JsxChildExpression is better?


JsxMemberExpression // (JsxMemberName?)

Better JsxMemberExpression, which is inline with the other member expressions.


Reuse JsStringLiteralExpression or introduce JsxStringLiteral

I'd prefer JsxStringLiteral, it might have some implications in our formatter

tomByrer commented 2 years ago

Could you handle non-React JSX like SolidJS? SolidJS is gaining popularity, & examples' JSX is very close to React's, but inline CSS has normal (dash-case) property names, & some funky attribute prefixes.

MichaReiser commented 2 years ago

Member expressions can be recursive, meaning that we can have something like <a.b.c></a.b.c>. This mean that object: should be able to have JsxMemberExpression too.

Representing nested member expressions should be possible because object is a JsxAnyIdentifier where JsxMemberExpression is a member of.

But these are only semantic differences that aren't validated in a mutation API anyway and may, thus, not be worth it

Also names with first capital letter are exceptions, e.g. <Aside></Aside>. Personally, I would prefer a new node because it makes easier to pinpoint these nodes inside analyzers. But I don't have strong opinions :)

Identifiers with capital letters are also valid in JS? There's nothing preventing you from writing let Aside = 10; But I agree on the sentiment. They seem to be different enough to justify a new node.

Maybe JsxChildExpression is better?

The idea is that all variants share the same postfix (at least for those, that are specific for that union). That's why it is JsxExpressionChild to make it clear it's a child and not a variant of JsAnyExpression

JsxMemberExpression // (JsxMemberName?)

Better JsxMemberExpression, which is inline with the other member expressions.

Agree, my main concern is that it can give the impression that JsxMemberExpression is a variant of the JsAnyExpression. The reason the other member names end with Expression is that they are expressions. This isn't the case here. We could also go with JsxMemberIdentifier

Could you handle non-React JSX like SolidJS? SolidJS is gaining popularity, & examples' JSX is very close to React's, but inline CSS has normal (dash-case) property names, & some funky attribute prefixes.

@tomByrer are you mainly referring to the namespace:attribute name syntax that solideJS uses? My understanding is that this is standard JSX and covered by the JsxAttributeName grammar that allows for a namespace name (namespace:name)