parsica-php / parsica

Parsica - PHP Parser Combinators - The easiest way to build robust parsers.
https://parsica-php.github.io/
MIT License
405 stars 18 forks source link

Where to look for potential JSON parser improvements #17

Open turanct opened 3 years ago

turanct commented 3 years ago

1. whitespace parser

zeroOrMore(satisfy(isCharCode([0x20, 0x0A, 0x0D, 0x09])))

it's potentially faster to do takeWhile on the stream, with the same predicate, skipping the zeroOrMore combinator and the satisfy.

https://github.com/mathiasverraes/parsica/blob/main/src/JSON/JSON.php#L154-L158

2. between

we rely heavily on between, which is based on sequence and bind. If we would find the tiniest speed improvement in those, we would make the parser a lot faster.

https://github.com/mathiasverraes/parsica/blob/main/src/JSON/JSON.php#L96-L103

3. sepby

we rely on this function often in the JSON parser too. It is built in terms of sepBy1 which is written in a "readable" way, but not a really efficient way:

function sepBy1(Parser $separator, Parser $parser): Parser
{
    $prepend = fn($x) => fn(array $xs): array => array_merge([$x], $xs);
    $label = $parser->getLabel() . ", separated by " . $separator->getLabel();
    return pure($prepend)->apply($parser)->apply(many($separator->sequence($parser)))->label($label);
}

https://github.com/mathiasverraes/parsica/blob/main/src/JSON/JSON.php#L99-L102 https://github.com/mathiasverraes/parsica/blob/main/src/combinators.php#L513-L519

ToonGagor commented 3 years ago

Another improvement would be the string parser:

Screenshot 2021-01-02 at 16 19 57 Screenshot 2021-01-02 at 16 21 10 Screenshot 2021-01-02 at 16 20 53

It's faster to parse a bunch of nested objects, than it is to parse a list of simple strings.

https://github.com/mathiasverraes/parsica/blob/ddefc569877a1d9527b94ce5d833d2af0aa26d6f/src/JSON/JSON.php#L180-L202

We use zeroOrMore in combination with choice, which is a slow process of trying parsers and moving on to the next one. Maybe we can make this faster with the takeWhile stream method or something like it.