ocaml-community / sedlex

An OCaml lexer generator for Unicode
MIT License
240 stars 43 forks source link

Should be possible to pass a sedlex regexp into a function, etc #15

Closed mcclure closed 1 year ago

mcclure commented 9 years ago

Here is a high-level thing I want to be able to do: I want to be able to do a match%sedlex, with one of the matched patterns being something that was "decided" by surrounding code. For example, I would like to be able to do this:

let match_letters_until stop_pattern buf =
    match%sedlex buf with
        | alphabetic -> print_endline "got letter"; match_letters_until stop_pattern buf
        | stop_pattern -> ()
        | _ -> failwith ("Illegal!")

I would then like to be able to call this with match_letters_until "))" buf, or match_letters_until eof buf. This seems like a reasonable thing to want to do with a regexp library, but it is not possible; patterns saved with e.g. let two_left_parens = [%sedlex.regexp? ')',')'] are not actually variables in the local environment, but rather according to the docs are "named regular expressions" whose definitions "appear in place of a structure item" and exist in their own namespace. This means a regular expression can't be returned from a function, passed into a function, or returned from a match...with. All methods of matching against a dynamically constructed regexp appear to be excluded.

It seems like if the pattern and variable namespaces need to be separate, one way around this would be to have some mechanisms for moving values between the two namespaces. For example this could be legal:

let two_left_parens = [%sedlex.frozen_regexp? ')',')']

which would store a representation of a regular expression in an ordinary variable; and then one could later say something like [%sedlex.regexp? from_frozen(two_left_parens) ] to use it in a regexp; or maybe say something like [%sedlex.regexp? from_string( some_local_variable ) ] to match a literal identifier out of a local string.

PS sedlex is super cool!

whitequark commented 9 years ago

@mcclure I don't think this is possible. Sedlex works by generating an NFA at compile time; it has to know all possible paths or the transformation can't be done.

mcclure commented 9 years ago

Yikes.

whitequark commented 9 years ago

Sedlex is not actually a regexp library in the common sense, it's a lexer generator like flex.

You can work around it using several automatons, but it's probably easier to restructure your code.

hhugo commented 1 year ago

I think this should either be closed or labelled as wont-fix

toots commented 1 year ago

Let's do both, thanks!