Open jgomezb11 opened 1 year ago
Using https://ikatyang.github.io/tree-sitter-yaml/ to reproduce, your fixture parses as correct CST tree
foo: `double_quote_scalar`
foo2: `single_quote_scalar`
number: `single_quote_scalar`
You are right, it parses correctly... but it still omits information. Another example to make myself clear:
Using the same tool, https://ikatyang.github.io/tree-sitter-yaml/, if you try to parse:
foo: "\n"
It will generate a double_quote_scalar
node that has a child named escape_sequence
which refers to the information inside the quotes (in this case \n
)...
That doesn't happen when trying to force a keyword into a string like my first example (foo: "*"
). The parsed result has a double_quote_scalar
without a child that refers to the content inside the quotes.
That's why I'm saying that the parser omits information.
It will generate a double_quote_scalar node that has a child named escape_sequence which refers to the information inside the quotes (in this case \n)...
Exactly. parser detected that double_quote_scalar
CST node has child of escape_sequence
.
That doesn't happen when trying to force a keyword into a string like my first example (foo: "*"). The parsed result has a double_quote_scalar without a child that refers to the content inside the quotes.
Because the double_quote_scalar
CST node doesn't have an escape_sequence
child, as the original source string doesn't contain escape sequences.
That's why I'm saying that the parser omits information.
I don't see your point. It does not omit anything. It parses what it sees. If it intercept the escape sequence in double quote scalar it will parse it, it if doesn't see any escape sequences in double quote scalar, it does not produce any CST nodes.
If it intercept the escape sequence in double quote scalar it will parse it, it if doesn't see any escape sequences in double quote scalar, it does not produce any CST nodes. That's exactly the problem.
You are right when you say that the escape_sequence
node only looks for occurrences of an escape sequence but then there should be another type of node that matches the contents of the quotes if it does not have escape sequences; otherwise, it is as if that the source string does not exist.
Another example
foo: "foo \n"
In this case there is a child node that points to \n
but there isn't a child node that refers to the first part of the string (foo
) resulting in a loss of information.
Graphical representation of the example:
As you can see there is a child node that points to a newline but the rest of the source string is nowhere to be found.
Right, I understand what you're saying now.
I'm not an author of this library, but I use grammar to create syntactic analyzer on top of the CST, that this grammar produces. In the case of foo: "foo \n"
, I take the content of double_quote_scalar
node and run an unraw
operation on it.
I don't care if the double_quote_scalar
contains escape_sequence
. Can't you just ignore escape_sequence
as I'm doing?
Oh, that's interesting... I'll look to see if I can implement something similar. Thank you for replying to my issue I hope some maintainer will someday look into this as well.
Np, just try to think of it, as if double_quote_scalar
not having any children and escape_sequence
doesn't exist. I use this implementation of unraw in javascript: https://www.npmjs.com/package/unraw
There will be tools for other languages in their standard or vendor libraries I'm sure.
There are actually more things that needs to be done for getting value out of double_quote_scalar
: here is implementation I did some time ago: https://github.com/swagger-api/apidom/blob/main/packages/apidom-ast/src/yaml/schemas/canonical-format.ts#L142
When trying to force a keyword or number into a string as in the following example:
The parsed result omits everything inside the double or single quotes, which means a loss of information compared to the initial file. Everything inside the quotes should be considered as a string and parsed as such, not omitted.
Or am I missing something?