tree-sitter / tree-sitter-python

Python grammar for tree-sitter
MIT License
360 stars 132 forks source link

bug: escape_sequence not detected as a change when toggling prefix "r" for the string #272

Open 8day opened 2 months ago

8day commented 2 months ago

Did you check existing issues?

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

tree-sitter 0.22.3

Describe the bug

Using old_tree.changed_ranges(new_tree) Python parser does not detect removal or insertion of node escape_sequence when switching between plain string and r-prefixed-string.

Toggling of prefix r for the string results in a change of node string_start, but while string_content, parent of escape_sequence, has no changes in content, its structure changes when escape_sequence is detected/ignored.

Note that it seems that the equivalent changes to f-prefixed-string are detected as expected.

P.S. Sorry for example written in Python, but I don't know C/CLI scripts to reproduce the bug. Toggle commented/uncommented strings to switch between r-string and f-string.

Steps To Reproduce/Bad Parse Tree

  1. Create text file with a string containing escape sequence: "for whom the \x07 {'tolls'}".
  2. Parse it to get tree A: (module (expression_statement (string (string_start) (string_content (escape_sequence)) (string_end)))).
  3. Edit string by adding prefix r: r"for whom the \x07 {'tolls'}".
  4. Parse it to get tree B: (module (expression_statement (string (string_start) (string_content) (string_end)))).
  5. Call A.changed_ranges(B), and receive this output: [<Range ... start_byte=0, end_byte=1>].
  6. Edit string by removing prefix r: "for whom the \x07 {'tolls'}".
  7. Parse it to get tree C: (module (expression_statement (string (string_start) (string_content (escape_sequence)) (string_end)))).
  8. Call B.changed_ranges(C), and receive this output: [].

Expected Behavior/Parse Tree

A.changed_ranges(B) should have resulted in this output: [<Range ... start_byte=0, end_byte=1>, <Range ... start_byte=15, end_byte=19>]. B.changed_ranges(C) should have resulted in this output (indexes are approximate and should have spanned same range as escape sequence): [<Range ... start_byte=14, end_byte=18>].

Repro

from tree_sitter import Language, Parser
import tree_sitter_python

def make_byte_feeder(src):
    def feeder(pos, point):
        b = src[pos:pos+1]
        print(b.decode('utf-8'), end='')
        return b
    return feeder

# Empty `text` implies removal of selection.
# Non-empty `text` with `selection_start == selection_end` implies insertion.
# Non-empty `text` with `selection_start != selection_end` implies replacement.
def edit_tree(tree, src, selection_start, selection_end, text):
    new_src = src[:selection_start] + text + src[selection_end:]

    print('<'*10)
    tree.edit(
        start_byte=selection_start,
        old_end_byte=selection_end,
        new_end_byte=selection_start + len(text),
        start_point=(0, 0),
        old_end_point=(0, 0),
        new_end_point=(0, 0),
    )
    new_tree = parser.parse(make_byte_feeder(new_src), tree)
    print()
    print('>'*10)

    print('org:', src)
    print('alt:', new_src, end='\n\n')
    print('org root node:', tree.root_node)
    print('alt root node:', new_tree.root_node, end='\n\n')

    print('changes:', tree.changed_ranges(new_tree))

    return new_tree, new_src

src = r'''"for whom the \x07 {'tolls'}"'''.encode('utf-8')

parser = Parser(Language(tree_sitter_python.language()))
print('<'*10)
tree = parser.parse(make_byte_feeder(src))
print()
print('>'*10)

# TEST R-STRING.

old_tree = tree
tree, src = edit_tree(tree, src, 0, 0, 'r'.encode('utf-8'))
print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(1), old_tree.root_node.child(0).child(0).child(1).has_changes)

old_tree = tree
tree, src = edit_tree(tree, src, 17, 19, '10'.encode('utf-8'))
print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(1), old_tree.root_node.child(0).child(0).child(1).has_changes)

old_tree = tree
tree, src = edit_tree(tree, src, 0, 1, b'')
print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(1), old_tree.root_node.child(0).child(0).child(1).has_changes)

# TEST F-STRING.

# old_tree = tree
# tree, src = edit_tree(tree, src, 0, 0, 'f'.encode('utf-8'))
# print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
# print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
# print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(2), old_tree.root_node.child(0).child(0).child(2).has_changes)

# old_tree = tree
# tree, src = edit_tree(tree, src, 22, 27, 'rings'.encode('utf-8'))
# print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
# print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
# print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(2), old_tree.root_node.child(0).child(0).child(2).has_changes)

# old_tree = tree
# tree, src = edit_tree(tree, src, 0, 1, b'')
# print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
# print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
# print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(2), old_tree.root_node.child(0).child(0).child(2).has_changes)