zeusdeux / re2

Automatically exported from code.google.com/p/re2
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

CR LF Pairs Cannot be Considered as Line Endings #114

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The current implementation of re2 only recognize one kind of line ending, the 
Unix "\n" ending. Both Windows and RFC-822 messages use the CF-LF pair ("\r\n") 
to end lines. The pattern "(?m:end$)" will therefore match the string "in the 
end\nThe love you make\n", but it will not match "in the end\r\nThe love you 
make\r\n", simply because re2 does not recognize "\r\n" as a line ending.

Text using CR-LF line endings is common enough that re2 should at least support 
the option of recognizing them, perhaps via an additional flag.

Original issue reported on code.google.com by Lhot...@gmail.com on 5 Jun 2014 at 3:56

GoogleCodeExporter commented 9 years ago
This is a known difference compared to Perl. RE2 works fundamentally a byte at 
a time and it only has 1-byte lookahead. It cannot look for \r\n. This is the 
same reason RE2 has \z but not \Z. I am sorry. Convert the \r\n to \n in your 
text before matching.

Original comment by rsc@golang.org on 5 Jun 2014 at 4:43