sourcegraph / tree-sitter-jsonnet

tree-sitter grammar for JSONNET
MIT License
15 stars 4 forks source link

fix: parse multiline string #19

Closed Duologic closed 10 months ago

Duologic commented 1 year ago

Parsing multiline string with optionally one or two pipe | symbols in it.

According to spec the parser should also allow triple pipe in the string_content but I couldn't get that to work reliably.

Spec:

Text block, beginning with |||, followed by optional whitespace and a new-line. The next non-empty line must be prefixed with some non-zero length whitespace W. The block ends at the first subsequent line that is non-empty and does not begin with W, and it is an error if this line does not contain some optional whitespace followed by |||. The content of the string is the concatenation of all the lines between the two |||, which either begin with W (in which case that prefix is stripped) or they are empty lines (in which case they remain as empty lines). The line ending style in the file is preserved in the string. This form cannot be used in import statements.

The next non-empty line must be prefixed with some non-zero length whitespace W.

Or with other words, the text is indented after the first |||. I found some evidence that tree-sitter can parse indentation, but couldn't get to work either.

Some additional corpus tests for the triple pipe:


==============
Multiple lines with ||| as a line
==============

|||
  abc
  |||
  abc
|||

---

(document
  (string
    (string_start)
    (string_content)
    (string_end)))

==============
Multiple lines with ||| in middle of line
==============

|||
  abc
  aa|aa
  aa||aa
  aa|||aa
  abc
|||

---

(document
  (string
    (string_start)
    (string_content)
    (string_end)))

==============
Multiple lines with ||| at line beginning
==============

|||
  abc
  |||aa
  abc
|||

---

(document
  (string
    (string_start)
    (string_content)
    (string_end)))

==============
Multiple lines with ||| at line end
==============

|||
  abc
  aaa|||
  abc
|||

---

(document
  (string
    (string_start)
    (string_content)
    (string_end)))

==============
Multiple lines - assigned
==============

{

  a: |||
    abc
  |||
  ,
}

---

(document
  (object
    (member
      (field
        (fieldname
          (id))
        (string
          (string_start)
          (string_content)
          (string_end))))))
Duologic commented 1 year ago

I just learned that this string scanning uses scanner.c

Duologic commented 10 months ago

I've only seen issues on very rare occasions, not going to look into it further.