microsoft / ts-parsec

Writing a custom parser is a fairly common need. Although there are already parser combinators in others languages, TypeScript provides a powerful and well-structured foundation for building this. Common parser combinators’ weakness are error handling and ambiguity resolving, but these are ts-parsec’s important features. Additionally, ts-parsec provides a very easy to use programming interface, that could help people to build programming-language-scale parsers in just a few hours. This technology has already been used in Microsoft/react-native-tscodegen.
Other
353 stars 18 forks source link

Monadic bind operator #35

Closed floyd-may closed 1 year ago

floyd-may commented 2 years ago

In parsec and fparsec, you can use the >>= operator (monadic bind) to use the parsed result from a parser to produce a new parser. There doesn't appear to be equivalent functionality in typescript-parsec. I propose the addition of a new combinator combine (since bind might be confusing to experienced javascript programmers):

function combine<TLeft, TRight>(pLeft: Parser<unknown, TLeft>, pRightApply: (val: TLeft) => Parser<unknown, TRight>): Parser<unknown, TRight> { /* implementation */ }
ZihanChen-MSFT commented 1 year ago

If val is from pLeft, then you already know what val would be, why do you need to create pRight from val?

floyd-may commented 1 year ago

You can't know val until runtime when the parser is executing. Let's say we want to parse the following:

A leading count followed by exactly count instances of a word (parsed by pWord). Some examples of valid input would be:

  • 3 word thing stuff
  • 5 a short little word pile and invalid input would be:
  • 3 too few
  • 2 too many words

In parsec or fparsec, you could parse this with pint >>= fun count -> parray count pWord (roughly, where parray is similar to rep_n). The intent here is to dynamically generate the parser at runtime via pRightApply based on the value of val at runtime.

ZihanChen-MSFT commented 1 year ago

I see, so what you want is a context sensitive syntax. It looks useful. I will use that as a unit test.