julia-vscode / DocumentFormat.jl

Auto-formatter for Julia
Other
62 stars 18 forks source link

`format()` is un-escaping `$` inside a docstring #32

Closed NHDaly closed 2 years ago

NHDaly commented 5 years ago

When formatting this docstring, it seems to remove one layer of escaping:


julia> s = """
       \"""
       Interpolate using `\\\$`
       \"""
       foo()
       """
>> "\"\"\"\nInterpolate using `\\\$`\n\"\"\"\nfoo()\n"

julia> sf = DocumentFormat.format(s)
>> "\"\"\"\nInterpolate using `\$`\n\"\"\"\nfoo()\n"

julia> println(s)
"""
Interpolate using `\$`
"""
foo()

julia> println(sf)
"""
Interpolate using `$`
"""
foo()

It seems to work fine for regular strings though:

julia> s = """
       \"""
       Interpolate using `\\\$`
       \"""
       """
>> "\"\"\"\nInterpolate using `\\\$`\n\"\"\"\n"

julia> sf = DocumentFormat.format(s)
>> "\"\"\"\nInterpolate using `\\\$`\n\"\"\"\n"

So there's something unique to the docstring handling that causes this bug.

Thanks!! :)


EDIT: Improved the examples to be a better MWE of the bug -- fix all the other formatting problems that format() would fix so the only change is this bug.

NHDaly commented 5 years ago

Here is, i think, a better MWE:

julia> println(DocumentFormat.format("\" \\\\ \" x"))
"""
 \
"""
x

It's just that it seems to evaluate the strings, thus eliminating a layer of escaping.

NHDaly commented 5 years ago

The trouble actually seems to come from CSTParser.parse:

julia> x = DocumentFormat.CSTParser.parse(""" "a \\\\ b" x""", true)
>> CSTParser.FileH  11 (11)
 LITERAL:   1 (1)
 CSTParser.MacroCall  10 (10)
  CSTParser.GlobalRefDoc  0 (0)
  LITERAL: a \ b  9 (8)
  ID: x  1 (1)
NHDaly commented 5 years ago

Okay narrowed it down further:

this:

julia> CSTParser.parse_string_or_cmd(CSTParser.next(CSTParser.ParseState("\"a\\\\b\"")))
>> LITERAL: a\b  6 (6)

should be returning LITERAL: a\\b instead

NHDaly commented 5 years ago

Okay! So the problem comes here, src/components/strings.jl#L96, when the buffer is passed through tostr, which calls _unescape_string and ultimately drops the leading '\\'.

Is there a reason why we unescape the string? Since this is ultimately going to be written back into a user's code, I think we would want to keep any escaping in the original string in tact!

Thanks! :)


EDIT: Fixed link in strings.jl to src/components/strings.jl#L96.

NHDaly commented 5 years ago

At this point I think that's all the digging I can do on my own. @ZacLN, do you know about the motivation for the above behavior?

Thanks! :)

NHDaly commented 5 years ago

Oops, sorry, I had the wrong link in my comment (https://github.com/ZacLN/DocumentFormat.jl/issues/32#issuecomment-454822374). That should've been src/components/strings.jl#L96. Edited.

NHDaly commented 5 years ago

Oh also, this might be related to this broken test, so maybe you already knew about it?: https://github.com/ZacLN/CSTParser.jl/blob/24c51db7e643751a707efced476129b3e03b0df7/test/parser.jl#L708

That seems to have been added here: https://github.com/ZacLN/CSTParser.jl/commit/08a094d3170ea45e52a4639781101f113c787f73

domluna commented 5 years ago

@NHDaly the original example you gave works on https://github.com/domluna/JLFmt.

The second example doesn't due to triple strings being assigned treated as single quoted strings in CSTParser. I'll have to tweak that.

I got a couple of bugs to sort out (that I know of) but aside from that I'm looking to make an alpha release this week.

julia> println(DocumentFormat.format("\" \\\\ \" x"))

also works. Adding these to the tests :+1:

ZacLN commented 4 years ago

formatting of docstrings is currently disabled, moving from Current