projectfluent / fluent-rs

Rust implementation of Project Fluent
https://projectfluent.org
Apache License 2.0
1.08k stars 97 forks source link

Improper handling of Windows line feeds in FTL files #220

Closed Ygg01 closed 1 year ago

Ygg01 commented 3 years ago

Hi, I'm working on porting Fluent.rs to .Net and I stumbled upon a possible error in the parser:

If you try to parse the following string # msg\r\n# yes\r\n#\r\nfoo=Foo you'll get errors. The same example on the Fluent playground, gets you:

{
    "type": "Resource",
    "body": [
        {
            "type": "Message",
            "id": {
                "type": "Identifier",
                "name": "foo",
                "span": {
                    "type": "Span",
                    "start": 14,
                    "end": 17
                }
            },
            "value": {
                "type": "Pattern",
                "elements": [
                    {
                        "type": "TextElement",
                        "value": "Foo",
                        "span": {
                            "type": "Span",
                            "start": 20,
                            "end": 23
                        }
                    }
                ],
                "span": {
                    "type": "Span",
                    "start": 20,
                    "end": 23
                }
            },
            "attributes": [],
            "comment": {
                "type": "Comment",
                "content": "msg\nyes\n",
                "span": {
                    "type": "Span",
                    "start": 0,
                    "end": 13
                }
            },
            "span": {
                "type": "Span",
                "start": 0,
                "end": 23
            }
        }
    ],
    "span": {
        "type": "Span",
        "start": 0,
        "end": 24
    }
}

Which makes the most sense. I managed to track down the offending line to this branch: https://github.com/projectfluent/fluent-rs/blob/4d13b24140409b377362442cf748a7f789c3f822/fluent-syntax/src/parser/comment.rs#L34

If the line was:

} else if self.is_current_byte(b'\n') 
  || (self.is_current_byte(b'\r') &&  self.is_byte_at(b'\n', self.ptr + 1) { 

I believe the error wouldn't happen, but I haven't tested it.

zbraniecki commented 3 years ago

Happy to take a PR!