tree-sitter / tree-sitter-python

Python grammar for tree-sitter
MIT License
360 stars 132 forks source link

bug: Octal escapes sequences with less than 3 digits are NOT parsed as `escape_sequence` #270

Closed ValdezFOmar closed 2 months ago

ValdezFOmar commented 2 months ago

Did you check existing issues?

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

No response

Describe the bug

Octal escape sequences are defined in the grammar as /\d{3}/, but the python documentation states that:

  1. As in Standard C, up to three octal digits are accepted.

Meaning that 3 is the maximum, not the required amount. Easy test:

>>> '\1' == '\01' == '\001'
True

Steps To Reproduce/Bad Parse Tree

'\1'
'\01'
'\001'
(module [0, 0] - [3, 0]
  (expression_statement [0, 0] - [0, 4]
    (string [0, 0] - [0, 4]
      (string_start [0, 0] - [0, 1])
      (string_content [0, 1] - [0, 3])
      (string_end [0, 3] - [0, 4])))
  (expression_statement [1, 0] - [1, 5]
    (string [1, 0] - [1, 5]
      (string_start [1, 0] - [1, 1])
      (string_content [1, 1] - [1, 4])
      (string_end [1, 4] - [1, 5])))
  (expression_statement [2, 0] - [2, 6]
    (string [2, 0] - [2, 6]
      (string_start [2, 0] - [2, 1])
      (string_content [2, 1] - [2, 5]
        (escape_sequence [2, 1] - [2, 5]))
      (string_end [2, 5] - [2, 6]))))

Expected Behavior/Parse Tree

(module [0, 0] - [3, 0]
  (expression_statement [0, 0] - [0, 4]
    (string [0, 0] - [0, 4]
      (string_start [0, 0] - [0, 1])
      (string_content [0, 1] - [0, 3]
        (escape_sequence [0, 1] - [0, 3]))
      (string_end [0, 3] - [0, 4])))
  (expression_statement [1, 0] - [1, 5]
    (string [1, 0] - [1, 5]
      (string_start [1, 0] - [1, 1])
      (string_content [1, 1] - [1, 4]
        (escape_sequence [1, 1] - [1, 4]))
      (string_end [1, 4] - [1, 5])))
  (expression_statement [2, 0] - [2, 6]
    (string [2, 0] - [2, 6]
      (string_start [2, 0] - [2, 1])
      (string_content [2, 1] - [2, 5]
        (escape_sequence [2, 1] - [2, 5]))
      (string_end [2, 5] - [2, 6]))))

Repro

'\1'
'\01'
'\001'