Closed odelalleau closed 3 years ago
The output is
hello\" }
but the expected output ishello\"}
(the difference is no space between the"
and}
) The output in the comment is nothello\" }
buthello\\" }
. Which one is it?
Looking at this:
r""" ${identity: "hello\\" }"} """
I really have no idea what you are going for here. You are trying to go for a hello\ or are you trying to escape the double quote in the string so that it will appear in the resulting output? are you trying to do both?
On master, the output is ' hello\" } ':
In [57]: OmegaConf.clear_resolvers()
...: OmegaConf.register_new_resolver("identity", lambda x: x)
...: cfg = OmegaConf.create({
...: "a": r""" ${identity: "hello\\" }"} """
...: })
...: print(f"|{cfg.a}|")
| hello\" } |
The output is
hello\" }
but the expected output ishello\"}
(the difference is no space between the"
and}
) The output in the comment is nothello\" }
buthello\\" }
. Which one is it?Looking at this:
r""" ${identity: "hello\\" }"} """
I really have no idea what you are going for here. You are trying to go for a hello\ or are you trying to escape the double quote in the string so that it will appear in the resulting output? are you trying to do both?
So, the intention is for ${identity: "hello\\" }
to be an interpolation block where we pass to identity
the string containing the following characters: hello\
However, the current code instead considers that the interpolation block is ${identity: "hello\\" }"}
and provides to identity
the string containing the following characters: hello\" }
On master, the output is ' hello" } ':
No, your code shows there is a backslash too (which isn't the problem -- the problem is that there should be no space after the double quote)
So, the intention is for
${identity: "hello\\" }
to be an interpolation block where we pass toidentity
the string containing the following characters:hello\
Like the following failed attempts?
# not working:
OmegaConf.create({"a": r"${identity: hello\}"}).a
...
GrammarParseError: missing BRACE_CLOSE at '<EOF>'
full_key: a
object_type=dict
# not working:
OmegaConf.create({"a": r"${identity: hello\\}"}).a
'hello\\'
OmegaConf.create({"a": r"${identity: 'hello\'}"}).a
'hello\\'
OmegaConf.create({"a": r"${identity: 'hello\\'}"}).a
'hello\\'
Can you show me how you get Python to print 'hello\
' (hello with a tailing backslash).
Like the following failed attempts?
Your last 3 examples actually worked, but:
OmegaConf.create({"a": r"${identity: hello\\}"}).a
=> only works without a quoted string because hello
only has standard characters allowed in unquoted strings, replace it with h^llo
for instance and you can't do it without quotes anymore
OmegaConf.create({"a": r"${identity: 'hello\'}"}).a
=> works because the regex extracting the quoted string actually identifies 'hello\'
and doesn't go further, since there is no single quote afterwards. This is why this problem hasn't surfaced before, you need a really specific setup to highlight it (like the one in my example)
OmegaConf.create({"a": r"${identity: 'hello\\'}"}).a
=> same thing, the quoted string is 'hello\\'
and the double escape gets converted in a single one during the un-escape phase of quoted strings
Can you show me how you get Python to print '
hello\
' (hello with a tailing backslash).
Easy: print('hello\\')
What happened is you haven't been using print()
explicitly, and the Python interpreter prints the output of __repr__()
by default. And:
print(repr(r"There is a single backslash \ in this string"))
'There is a single backslash \\ in this string'
=> always use print() when there are \ in the string, otherwise it's quite confusing :)
This time I paid special attention to how GitHub is rendering this crap, hopefully it will be clearer than my past attempts:
Going back to the original example:
There exist tricky edge cases where quoted strings are not parsed as expected. Here is an example:
OmegaConf.clear_resolvers() OmegaConf.register_new_resolver("identity", lambda x: x) cfg = OmegaConf.create({ "a": r""" ${identity: "hello\\" }"} """ }) print(cfg.a) # ' hello\\" } '
The output is
hello\" }
but the expected output ishello\"}
(the difference is no space between the"
and}
)
It's not clear to me what you are after with
r""" ${identity: "hello\\" }"} """
This seems ambiguous to me:
Interpretation 1:
\\
followed by a "
=> \"
(backslash and then the string ends), which should create a syntax error because of the characters following the end of the string.
Interpretation 2:
\
followed by a \"
=> \
followed by "
not closing the quoted string.
We and up with a singleton \
after un-escaping which I think should trigger an error.
To get your desired output, this seems to work and makes sense to me:
In [45]: cfg = OmegaConf.create({"a": r'${identity:"hello\\\"} "}'}); print(f"'{cfg.a}'");cfg.a == r"hello\"} "
'hello\"} '
Out[45]: True
It's not clear to me what you are after with
r""" ${identity: "hello\\" }"} """
This seems ambiguous to me:
Interpretation 1:
\\
followed by a"
=>\"
(backslash and then the string ends), which should create a syntax error because of the characters following the end of the string.
That's (almost) the correct interpretation: since there are two \\
before the "
, the string should end.
There should be no syntax error though, because users are free to use any character they want outside of interpolations. The following examples should all be valid:
OmegaConf.create({
"a": "Hello ${world}!", # nothing special
"b": "Hello '${world}'!", # kind of a quoted interpolation, though nothing special is done with the quotes
"c": "Hello ${world}'!", # quote mismatch but we don't care
"d": "Hello ${world}}'!", # quote and brace mismatch but we still don't care
})
To get your desired output, this seems to work and makes sense to me: (...)
Just to be clear, the purpose of this issue isn't that we can't get a specific output, but to show a situation where the current code isn't outputting what it should.
It's not clear to me what you are after with
r""" ${identity: "hello\\" }"} """
This seems ambiguous to me: Interpretation 1:
\\
followed by a"
=>\"
(backslash and then the string ends), which should create a syntax error because of the characters following the end of the string.That's (almost) the correct interpretation: since there are two
\\
before the"
, the string should end. There should be no syntax error though, because users are free to use any character they want outside of interpolations.
But we are not outside of an interpolation. We are inside
${identity: ...}.
the ... should be a sequence of valid elements. we have ended an element (the string). the next thing should be a comma or closing the custom resolver. (ignoring whitespace).
r""" ${identity: "hello\\", foo, bar} """
the ... should be a sequence of valid elements. we have ended an element (the string). the next thing should be a comma or closing the custom resolver. (ignoring whitespace).
I agree, and that's actually the case in r""" ${identity: "hello\\" }"} """
: the next thing after the string ends is closing the custom resolver with the }
, and what comes after belongs to the top-level string and should just be copied "as is"
Okay, I FINALLY understand what you did there.
I thought that this was your string: "hello\\" }"
and this was the outer interpolation: ${identity: "hello\\" }"}
.
I will take another look at this later.
Okay, I FINALLY understand what you did there.
Sorry, I know it's a confusing example, but that's the whole point: it is confusing to the current parser.
I thought that this was your string:
"hello\\" }"
and this was the outer interpolation:${identity: "hello\\" }"}
.
So did the grammar :)
the ... should be a sequence of valid elements. we have ended an element (the string). the next thing should be a comma or closing the custom resolver. (ignoring whitespace).
I agree, and that's actually the case in
r""" ${identity: "hello\\" }"} """
: the next thing after the string ends is closing the custom resolver with the}
, and what comes after belongs to the top-level string and should just be copied "as is"
r""" ${identity: "hello\\" }"} """
So your intention is that the identity is getting "hello\"
, and outside of the custom resolver you have }"
(space, brace, double quote)?
Shouldn't the trailing double quote be an error then? It should be the beginning of a new quotes string that is never terminating.
At a high level, I agree that when we see "hello\\"
, we should interpret it as a double quoted hello\
.
r""" ${identity: "hello\\" }"} """
So your intention is that the identity is getting
"hello\"
, and outside of the custom resolver you have}"
(space, brace, double quote)?
Correct.
Shouldn't the trailing double quote be an error then? It should be the beginning of a new quotes string that is never terminating.
No because what's outside of interpolations is free-form, you can use quotes however you want. But you can add an extra quote if you prefer -- it shouldn't change the underlying issue.
At a high level, I agree that when we see
"hello\\"
, we should interpret it as a double quotedhello\
.
Ok good, we're on the same page then :)
Describe the bug
There exist tricky edge cases where quoted strings are not parsed as expected. Here is an example:
The output is
hello\" }
but the expected output ishello\"}
(the difference is no space between the"
and}
)The reason for this lies in the regex for quoted strings:
Because of the
\\"
in the parsed string, it believes that this quote is an escaped quote, and thus keeps parsing until the next quote is found. However, the intent here is for\\"
to represent a backslash followed by a quote, i.e., the string should end with this backslash.(if you wonder why the test case looks so weird, it is so that it can run error-free with and without a potential fix)
Expected behavior
See expected output.
Additional context
There is a link to #615: if we didn't care about quoted strings ending with a backslash, we wouldn't need to escape backslashes, which would solve both issues at the same time. But I think we do care :)