Open DavisVaughan opened 1 day ago
The LSP normalises line endings here: https://github.com/posit-dev/air/blob/003c31bd2f0ec5379065a019c500ebe38fd914a3/crates/lsp/src/documents.rs#L55
I find this kind of interesting because:
For LSP workflows:
\n
endings on the way in (both normal line endings and these weird ones in multiline string ones)\r\n
endingsFor cli workflows:
\r\n
in multiline strings with \n
, but otherwise leave the \r\n
as isSo in cli workflows you can end up with what we want, which is \r\n
line endings everywhere except in the multiline strings, which have \n
line endings. But in LSP workflows you can't ever end up with the "right" thing on the way out, the last step of replacing all \n
with \r\n
means that on the way out even multiline strings get \r\n
endings again.
There's a decent chance this is why biome retains the original line endings in the LSP side of things
See also https://github.com/posit-dev/air/pull/69#issuecomment-2502473548
We have this CI failures on Windows https://github.com/posit-dev/air/actions/runs/12042018483/job/33574939950?pr=69
With this test
Which formats a file containing this string
Note that this is a multiline string. The raw string there isn't critical to the reprex, it's just what we happened to have there.
This fails because:
\r\n
inside the string contents.located_token_text()
for a string because of this no-\r assertionBoth biome and ruff use this assertion, so I imagine it is for a good reason.
The way both of them handle this is by using variants of
normalize_string()
here: https://github.com/biomejs/biome/blob/cd1c8ec4249e8df8d221393586d664537c9fddb2/crates/biome_formatter/src/token/string.rs#L48Note that the docs say one of the things it does is replace
\r\n
with just\n
.They call this at
fmt()
time for a string node, before it actually drops through tolocated_token_text()
, so they get a chance to "fix up" the string here. We could probably create our own version of this and strip out everything except the\r\n
conversion, until we decide to do more normalization.Why do they even do this?
I imagine the idea is that typically the line ending doesn't really matter, it becomes part of the Trivia set, but is not part of an actual node. But here, the line ending gets parsed into the actual string contents. I assume they want to ensure that the actual nodes of the tree look the same across OSes regardless of line endings, and standardizing on
\n
inside multiline strings is the way to achieve that.