scymtym / esrap

Common Lisp packrat parser
https://scymtym.github.io/esrap/
78 stars 12 forks source link

Add #\Return to whitespace characters #19

Open blake-watkins opened 7 months ago

blake-watkins commented 7 months ago

I was looking at aws-sdk, which uses parser.ini to parse configuration and credential INI files, which uses this library as a parser. The parse was failing since windows ends its lines with #\Return #\Newline, not just #\Newline.

Could you add #\Return to the whitespace definition in parser.common-rules/src/rules-whitespace.lisp? Or change the whitespace definition in parser.ini?

As a test case, the following call is currently failing, but adding #\Return to the list gives the correct answer:

CL-USER> (parser.ini:parse (format nil "[default]~c~ca = b" #\Return #\Newline) 'list)

((:SECTION
  (:SECTION-OPTION (((:OPTION NIL :NAME ("a") :VALUE "b" :BOUNDS (11 . 16)))))
  :NAME ("default") :BOUNDS (0 . 9)))
scymtym commented 7 months ago

Thanks for the suggestion.

I remember having some thoughts about the proper solution relying on external format mechanism of the Lisp implementation. The idea is that the byte sequences 0d 0a for Windows and just 0a for UNIX are both turned into the Lisp character #\Newline when they pass through the external format associated with the input stream. If I remember correctly, CCL has worked like that for a long time and SBCL now also supports different newline encodings in the external format (Since release 2.3.11, "enhancement: external formats for unibyte encodings and utf-8 now support newline variants.").

Is the actual use-case about parsing from a string, or this that just an artificial example to demonstrate the problem?

blake-watkins commented 7 months ago

Thanks for your response. The example is just artificial to show the issue, actually I was trying to use aws-sdk to make some AWS calls. It tries to get its configuration from config files. I had copied an example and since I was on windows the lines ended in CRLF. aws-sdk calls parser.ini:parse here with the name of the config file, but the parse was failing.

It's not an problem for me since I just switched the line endings, but thought it would be good to fix for others (there is an open issue in aws-sdk that I think is probably the same as I ran into).

Thanks for the info on the external format, unfortunately the latest SBCL binary for windows is 2.3.2 and I've been having some permission issues building from source this morning so I can't test to see if updating the SBCL version would also fix it. I'll try again when I have time.