hadronized / glsl

GLSL parser for Rust
191 stars 27 forks source link

Support DOS CRLF as line ending in parser #138

Closed alixinne closed 8 months ago

alixinne commented 3 years ago

I realized some parsers expect \n as the single char for a line ending, which isn't the case on Windows with its notorious \r\n. This PR replaces occurrences of \n in parsers with nom's line_ending which supports both. This also adds a test to prevent regressions.

Example of failure on Windows (on one of my personal projects): https://travis-ci.com/github/vtavernier/tinygl/jobs/393882051#L396

---- test_glsl_from_string stdout ----

Error: GlslParseError(ParseError { info: "0: at line 1:\n#version 460 core\n                 ^\nexpected \'\n\', found \r\n\n1: at line 1, in Alt:\n#version 460 core\n                 ^\n\n" })
hadronized commented 3 years ago

Hello, thank you a lot! I’ll try reviewing that tomorrow and merge ASAP!

AlexTMjugador commented 3 years ago

Any news on the merge status for this PR? I also stumbled on this issue on my project and look forward to see it merged.

hadronized commented 3 years ago

Parts of this got fixed in a recent PR (#145). Can you check whether something is still missing? Sorry for the delay.

alixinne commented 3 years ago

145 is lacking the following fixes in the helpers:

diff --git a/glsl/src/parsers.rs b/glsl/src/parsers.rs
index 1983f39..064ae11 100644
--- a/glsl/src/parsers.rs
+++ b/glsl/src/parsers.rs
@@ -9,7 +9,7 @@ mod nom_helpers;

 use nom::branch::alt;
 use nom::bytes::complete::{tag, take_until, take_while1};
-use nom::character::complete::{anychar, char, digit1, space0, space1};
+use nom::character::complete::{anychar, char, digit1, line_ending, space0, space1};
 use nom::character::{is_hex_digit, is_oct_digit};
 use nom::combinator::{cut, map, not, opt, peek, recognize, value, verify};
 use nom::error::{ErrorKind, ParseError as _, VerboseError, VerboseErrorKind};
@@ -1631,7 +1631,7 @@ pub(crate) fn pp_version_profile(i: &str) -> ParserResult<syntax::PreprocessorVe
 ///
 /// This parser is needed to authorize breaking a line with the multiline annotation (\).
 pub(crate) fn pp_space0(i: &str) -> ParserResult<&str> {
-  recognize(many0_(alt((space1, tag("\\\n")))))(i)
+  recognize(many0_(alt((space1, preceded(tag("\\"), line_ending)))))(i)
 }

 /// Parse a preprocessor define.
diff --git a/glsl/src/parsers/nom_helpers.rs b/glsl/src/parsers/nom_helpers.rs
index 44084f7..7bf02c1 100644
--- a/glsl/src/parsers/nom_helpers.rs
+++ b/glsl/src/parsers/nom_helpers.rs
@@ -6,6 +6,7 @@ use nom::character::complete::{anychar, line_ending, multispace1};
 use nom::combinator::{map, recognize, value};
 use nom::error::{ErrorKind, VerboseError, VerboseErrorKind};
 use nom::multi::fold_many0;
+use nom::sequence::preceded;
 use nom::{Err as NomErr, IResult};

 pub type ParserResult<'a, O> = IResult<&'a str, O, VerboseError<&'a str>>;
@@ -74,7 +75,13 @@ where
 /// Discard any leading newline.
 pub fn str_till_eol(i: &str) -> ParserResult<&str> {
   map(
-    recognize(till(alt((value((), tag("\\\n")), value((), anychar))), eol)),
+    recognize(till(
+      alt((
+        value((), preceded(tag("\\"), line_ending)),
+        value((), anychar),
+      )),
+      eol,
+    )),
     |i| {
       if i.as_bytes().last() == Some(&b'\n') {
         &i[0..i.len() - 1]
@@ -91,5 +98,5 @@ pub fn str_till_eol(i: &str) -> ParserResult<&str> {
 //
 // Taylor Swift loves it.
 pub fn blank_space(i: &str) -> ParserResult<&str> {
-  recognize(many0_(alt((multispace1, tag("\\\n")))))(i)
+  recognize(many0_(alt((multispace1, preceded(tag("\\"), line_ending)))))(i)
 }

Which means that other PR is not enough to fix line continuations on Windows. I have rebased and updated my PR to include a test for this case.

AlexTMjugador commented 3 years ago

FWIW, I've been using @vtavernier's fork with these changes for a while and they work like a charm. I also like the fact that they have some unit tests.

AlexTMjugador commented 3 years ago

After reviewing PR #149 I realized the changes this PR proposes in str_till_eol seem to not strip the carriage return for CRLF line endings, so it could return a string with an unwanted \r character in it. I pushed a commit to my fork, https://github.com/ComunidadAylas/glsl/commit/9381db48d36139d36f2d1e8a07ebbab719715c64, that completely strips line endings in both cases.

alixinne commented 3 years ago

According to the GLSL 4.60 spec, 3.1:

Lines are relevant for compiler diagnostic messages and the preprocessor. They are terminated by carriage-return or line-feed. If both are used together, it will count as only a single line termination. For the remainder of this document, any of these combinations is simply referred to as a new-line. Lines may be of arbitrary length.

This means we should define a glsl_line_ending that matches either of those 4 strings:

And then we could define the other parsers (str_till_eol and pp_space0) in terms of this, stripping what glsl_line_ending matches?

AlexTMjugador commented 3 years ago

That sounds perfect to me :+1: