whitequark / parser

A Ruby parser.
Other
1.58k stars 198 forks source link

Offsets with \r\n in source #1020

Open kddnewton opened 4 months ago

kddnewton commented 4 months ago

When you parse source that contains \r\n in the source, they are automatically converted into \n, as per https://github.com/whitequark/parser/blob/8bd5ec3416d58c28ab230b6e34a3bba1b58a8f06/lib/parser/source/buffer.rb#L190. The issue is that this can really throw off source locations. For example:

Parser::CurrentRuby.parse("1\r\n2\r\n3").children[2].loc
# => #<Parser::Source::Map::Operator:0x000000010b47c3d0 @expression=#<Parser::Source::Range (string) 4...5>, @node=s(:int, 3), @operator=nil>

This is saying the source range is 4...5, which is one of the \r characters.

For prism's purposes it's okay if the locations are different, I'll just make it so that it doesn't compare locations for files that contain \r\n. My issue is that I use the source buffer to parse with both parsers (https://github.com/ruby/prism/blob/90d570aa50bfff43c66e5f6c600370a61c091329/test/prism/ruby/parser_test.rb#L188-L208) but the source has already been modified internally in the buffer with no way to retrieve the original.

For a solution, I'm wondering if either:

(a) The \r\n gsub can be removed (and therefore have the parser instead of the buffer replace \r\n when necessary) (b) The Buffer class could support an auto_clrf parameter (or something) that would disable that behavior