=Parsexp= contains functionality for parsing s-expressions, which are defined in the [[https://github.com/janestreet/base][Base]] library as follows:
module Sexp : sig type t = Atom of string | List of t list end
=Parsexp= is platform independent and as such doesn't provide IO operations. The companion library [[https://github.com/janestreet/parsexp_io][Parsexp_io]] provides facilities for loading s-expressions from files.
** Syntax of s-expression
*** Lexical conventions of s-expression
Whitespace, which consists of space, newline, horizontal tab, and form feed, is ignored unless within an OCaml-string, where it is treated according to OCaml-conventions. The left parenthesis opens a new list, the right one closes it. Lists can be empty.
The double quote denotes the beginning and end of a string using similar lexing conventions to the ones of OCaml (see the [[http://caml.inria.fr/pub/docs/manual-ocaml/][OCaml-manual]] for details). Differences are:
All characters other than double quotes, left- and right parentheses, whitespace, carriage return, and comment-introducing characters or sequences (see next paragraph) are considered part of a contiguous string.
*** Comments
There are three kinds of comments:
*** Grammar of s-expressions
s-expressions are either strings (= atoms) or lists. The lists can recursively contain further s-expressions or be empty, and must be balanced, /i.e./ parentheses must match.
*** Examples
this_is_an_atom_123'&^%! ; this is a comment "another atom in an OCaml-string \"string in a string\" \123"
; empty list follows below ()
; a more complex example ( ( list in a list ; comment within a list (list in a list in a list) 42 is the answer to all questions
(has been commented out)
)
#| Block comments #| can be "nested" |# |#
)
)
** API
=Parsexp= offers a few different parsers. Each of them comes in 3 variants:
Parsers all implement the same API. For each parsers, =Parsexp= offers a low-level API, where one feeds characters one-by-one to the parser, as well as a convenience functions to parse a whole string.
*** Low-level API
With the low-level API, ones must create a parser state and feed it characters one by one. This allows =Parsexp= to be used with any kind of input as well as to be =Lwt= or =Async= friendly.
Essentially, if you want to parse from a different input source, you have to write the fetch-character-feed-to-parser loop yourself.
*** Parsers
=Parsexp= offers the following parsers:
In addition, it provides convenience functions for parsing strings and converting the result with a =Sexp.t -> 'a= function, reporting errors with accurate locations.
Each parsing/conversion functions comes in two variants: one raising exception in case of error and one returning a =Result.t= value. In general you should use the latter since parsing is an operation that can always fail.
*** Positions sets and error reporting
To deal with error locations when converting S-expressions to OCaml values, =Parsexp= introduces the notion of compact positions sets.
A positions set represents positions in the input source of all s-expressions. It has a small memory footprint and is relatively cheap to construct. Using a positions set and the corresponding s-expression, one can reconstruct the location of any sub s-expressions in the input source.
Depending on the input source, one can either:
The first method has the advantage that it is faster where there are no errors, however it is not suitable for sources that can't guarantee repeatable reads such as files.