rotty / lexpr-rs

Rust Lisp expression parser and serializer
Apache License 2.0
164 stars 24 forks source link

Parse error when whitespace is omitted #70

Closed elidupree closed 3 years ago

elidupree commented 3 years ago

lexpr::from_str("(Feedback())") returns Err(Error("trailing characters", line: 1, column: 12)).

It works fine if you say lexpr::from_str("(Feedback ())").

This is using lexpr 0.2.6.

I ran into this issue while trying to parse output from SerAPI.

elidupree commented 3 years ago

I'm looking in the code now. It seems the problem is that Read<'de>::parse_symbol implementations consider the first ( to be "part of the symbol", and only stop at the ). This is explicitly documented in the trait:

    /// Parses an unescaped string until the next whitespace or list close..
    #[doc(hidden)]
    fn parse_symbol<'s>(&'s mut self, scratch: &'s mut Vec<u8>) -> Result<Reference<'de, 's, str>>;

And implemented in the two parse_symbol_bytes methods:

            match self.peek()? {
                Some(b' ') | Some(b'\n') | Some(b'\t') | Some(b'\r') | Some(b')') | Some(b']')
                | None => {

I'm not too familiar with S-expressions, but Wikipedia suggests:

an unquoted identifier atom can typically contain anything but quotes, whitespace characters, parentheses, brackets, braces, backslashes, and semicolons.

The logical, minimal change to fix this would be to add | Some(b'(') | Some(b'[') in the pattern above (in both places). I wonder if they should also exclude the other characters mentioned ("{};)? Or worse, be configurable?

rotty commented 3 years ago

Thanks for the report! I will go for a rather minimal fix for now, adding b'(', b'[' b';' and b'"', as these are the characters handled by the current codebase. That still leaves some other characters (like {}, i.e., braces) that should get special treatment as well -- I need to have a look at different Lisp dialects (well, Scheme and ELisp foremost) on how to best handle these.

rotty commented 3 years ago

I messed up the commit message, mentioning #40 instead of this issue; this should be fixed in master now.