ikatyang / tree-sitter-markdown

Markdown grammar for tree-sitter
https://ikatyang.github.io/tree-sitter-markdown
MIT License
183 stars 29 forks source link

Crash when parsing with ts_parser_set_included_ranges #10

Closed pmacro closed 4 years ago

pmacro commented 4 years ago

Hi,

First of all, thanks for creating this project.

I've been getting a crash via the assert on line 253 of scanner.cc when running the following code. Note that the crash doesn't happen if the ts_parser_set_included_ranges line is removed, but for larger files removing this line has a major impact on performance.

#include <string.h>
#include <stdio.h>
#include <tree_sitter/api.h>

TSLanguage *tree_sitter_markdown();

char* readFile() {
  FILE *f = fopen("/path/to/CHANGELOG.md", "rb");
  fseek(f, 0, SEEK_END);
  long fsize = ftell(f);
  fseek(f, 0, SEEK_SET);  /* same as rewind(f); */

  char *string = malloc(fsize + 1);
  fread(string, 1, fsize, f);
  fclose(f);

  string[fsize] = 0;
  return string;
}

int main() {
  const char *source_code = readFile();
  TSParser *parser = ts_parser_new();
  ts_parser_set_language(parser, tree_sitter_markdown());

  TSTree *tree = ts_parser_parse_string(parser, NULL, source_code, strlen(source_code));
  TSNode root_node = ts_tree_root_node(tree);
  char *string = ts_node_string(root_node);

  TSRange range = {
    .start_point = {
      .row = 915,
      .column = 0,
    },
    .end_point = {
      .row = 1033,
      .column = 38,
    },
    .start_byte = 35840,
    .end_byte = 40326
  };

  ts_parser_set_included_ranges(parser, &range, 1);
  tree = ts_parser_parse_string(parser, tree, source_code, strlen(source_code));

  free(string);
  ts_tree_delete(tree);
  ts_parser_delete(parser);

  return 0;
}

Using this file, or seemingly any larger CHANGELOG file.

If you need any additional information, please let me know. Thanks.

pmacro commented 4 years ago

I think I misunderstood ts_parser_set_included_ranges: it's meant only for multi-language files, not for parsing a subset of the same language within the same file, so closing this. Feel free to re-open if you want to investigate anyway.

ikatyang commented 4 years ago

It seems you're trying to re-use the tree that's parsed from the entire text to parse part of the the text, my guess is that the error is caused by the different input range, you should pass the "same" input range with ts_tree_edit if you'd like to re-use the tree, or simply specify the include range in the first parse to achieve your goal, i.e., parsing a subset of the same language within the same file:

--- a/test.c
+++ b/test.c
@@ -23,10 +23,6 @@ int main() {
   TSParser *parser = ts_parser_new();
   ts_parser_set_language(parser, tree_sitter_markdown());

-  TSTree *tree = ts_parser_parse_string(parser, NULL, source_code, strlen(source_code));
-  TSNode root_node = ts_tree_root_node(tree);
-  char *string = ts_node_string(root_node);
-
   TSRange range = {
     .start_point = {
       .row = 915,
@@ -41,9 +37,8 @@ int main() {
   };

   ts_parser_set_included_ranges(parser, &range, 1);
-  tree = ts_parser_parse_string(parser, tree, source_code, strlen(source_code));
+  TSTree *tree = ts_parser_parse_string(parser, NULL, source_code, strlen(source_code));

-  free(string);
   ts_tree_delete(tree);
   ts_parser_delete(parser);