tree-sitter / tree-sitter-typescript

TypeScript grammar for tree-sitter
MIT License
357 stars 110 forks source link

bug: JSX captures whitespaces in nested, multiline tags #306

Open SomeoneToIgnore opened 1 month ago

SomeoneToIgnore commented 1 month ago

Did you check existing issues?

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

No response

Describe the bug

For a given TSX template,

a["b"] = <C d="e">
    <F></F>
    { g() }
</C>;

nested jsx_opening_element on a different line is captured with all whitespaces, as \n <F> instead of just <F>.

Steps To Reproduce/Bad Parse Tree

The Parse Tree is correct in both cases, but tree elements' ranges are not. I have not found a way to include ranges inside the node-based tests with *.txt files, so I've created a Rust test draft:

#[cfg(test)]
mod tests_f_node {
    use tree_sitter::Node;

    use super::*;

    #[test]
    fn tsx_tag_parse_ranges() {
        let code = r#"
                a["b"] = <C d="e">
                    <F></F>
                    { g() }
                </C>;
            "#;

        let mut parser = tree_sitter::Parser::new();
        parser
            .set_language(&super::language_tsx())
            .expect("Error loading TypeScript TSX grammar");

        let tree = parser.parse(code, None).unwrap();
        let root_node = tree.root_node();

        let f_node = get_f_node(root_node, code).expect("<F> node not found");

        // Assert the ranges. Modify these values according to the actual positions in your code.
        let start_byte = f_node.start_byte();
        let end_byte = f_node.end_byte();

        assert_eq!(start_byte, 36); // Replace with the correct start byte
        assert_eq!(end_byte, 39); // Replace with the correct end byte

        let start_position = f_node.start_position();
        let end_position = f_node.end_position();

        assert_eq!(start_position.row, 2); // Line number containing <F>
        assert_eq!(start_position.column, 16); // Column where <F> starts
        assert_eq!(end_position.row, 2);
        assert_eq!(end_position.column, 19); // Column where <F> ends
    }

    fn get_f_node<'a>(node: Node<'a>, code: &'a str) -> Option<Node<'a>> {
        for child in node.children(&mut node.walk()) {
            if child.kind() == "jsx_opening_element"
                && dbg!(child.utf8_text(code.as_bytes()).unwrap()) == "<F>"
            {
                return Some(child);
            }
            if let Some(found) = get_f_node(child, code) {
                return Some(found);
            }
        }
        None
    }
}

which outputs

---- tests_f_node::tsx_tag_parse_ranges stdout ----
[bindings/rust/lib.rs:118:20] child.utf8_text(code.as_bytes()).unwrap() = "<C d=\"e\">"
[bindings/rust/lib.rs:118:20] child.utf8_text(code.as_bytes()).unwrap() = "\n                    <F>"
thread 'tests_f_node::tsx_tag_parse_ranges' panicked at bindings/rust/lib.rs:97:50:
<F> node not found
stack backtrace:

on current master.

Expected Behavior/Parse Tree

I've bisected that to

37ced086ad8bb4fa67e8c53711e9f30e869bb78f is the first bad commit
commit 37ced086ad8bb4fa67e8c53711e9f30e869bb78f (HEAD)
Author: Amaan Qureshi <amaanq12@gmail.com>
Date:   Fri Jul 5 23:13:15 2024 -0400

    chore: generate

 tsx/src/grammar.json           |    370 +-
 tsx/src/node-types.json        |    843 +-
 tsx/src/parser.c               | 552504 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------------------------------------------------------------------
 typescript/src/grammar.json    |    366 +-
 typescript/src/node-types.json |    847 +-
 typescript/src/parser.c        | 530546 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------------------------------------------------------
 6 files changed, 440659 insertions(+), 644817 deletions(-)

and before this commit everything works fine:

[bindings/rust/lib.rs:118:20] child.utf8_text(code.as_bytes()).unwrap() = "<C d=\"e\">"
[bindings/rust/lib.rs:118:20] child.utf8_text(code.as_bytes()).unwrap() = "<F>"
thread 'tests_f_node::tsx_tag_parse_ranges' panicked at bindings/rust/lib.rs:103:9:
assertion `left == right` failed
// this failures is a cause of my test being a draft, but it's already exposing the issue hence useful in the current state

Repro

See the test above
SomeoneToIgnore commented 1 month ago

Hello, I'm interested to fix this and would love to get any pointers for that.

ediezindell commented 3 weeks ago

I was able to resolve the issue by rerunning npm run build in my PC.