nucleic / enaml

Declarative User Interfaces for Python
http://enaml.readthedocs.io/en/latest/
Other
1.53k stars 130 forks source link

Pegen based new parser #474

Closed MatthieuDartiailh closed 1 year ago

MatthieuDartiailh commented 2 years ago

This is a heavy WIP aiming at replacing the ply based parser by one based on pegen (i.e. the PEG parser generator used by CPython itself).

By doing this I hope to:

Pegen has a 0.1 release to which I participated but will likely need some extra work around error reporting (the Python generated parser does not yet mimic the C one used by CPython).

Also note that this will imply to drop Python 3.7

sccolbert commented 2 years ago

Hows the performance compared to the ply parser? (Im just curious).

On Sat, Jan 22, 2022 at 15:19 Matthieu Dartiailh @.***> wrote:

This is a heavy WIP aiming at replacing the ply based parser by one based on pegen (i.e. the PEG parser generator used by CPython itself).

By doing this I hope to:

  • be able to support seamlessly and will little effort new Python syntax (i.e. match)
  • have a clearer definition of the enaml syntax

Pegen has a 1.0 release to which I participated but will likely need some extra work around error reporting (the Python generated parser does not yet mimic the C one used by CPython).

You can view, comment on, or merge this pull request online at:

https://github.com/nucleic/enaml/pull/474 Commit Summary

File Changes

(23 files https://github.com/nucleic/enaml/pull/474/files)

Patch Links:

— Reply to this email directly, view it on GitHub https://github.com/nucleic/enaml/pull/474, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABBQSL6EI6L6ED5EPNU2J3UXMNNRANCNFSM5MSOKIVA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

MatthieuDartiailh commented 2 years ago

I did not measure it yet but I plan to do so once I get the tests to pass.

frmdstryr commented 1 year ago

Is anything blocking this? I've been working on getting KDevelop to support enaml and this parser sets the col_offsets.

Edit: After rebasing it fails on the zipimporter tests and two syntax error tests expecting a specific line number.

frmdstryr commented 1 year ago

I ran a quick benchmark using ipython's timeit on one of my projects. It was on 9 files loaded in in memory about 4k loc.

The existing parser is about 4-5 times faster. Both on python 3.10

from enaml.core.parser import parse
# read files into dict of filename, source
# then benchmark with
%timeit [parse(s, f) for f, s in files.items()]

Existing parser (current master)

296 ms ± 5.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
302 ms ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
290 ms ± 1.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Pegen parser

1.31 s ± 5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.36 s ± 8.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.36 s ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
MatthieuDartiailh commented 1 year ago

So this is definitively worse than I had hoped for. Could you try to measure separately the cost of lexing and of parsing ? Currently we use the builtin tokenize but I have changes (yet to be merged into pegen) that would allow to use a different tokenizer and it would be great to know if we can win anything on that front.

I think that a pegen based solution in pure Python will always be slower due to how the parser work but I hope we can make the change more tolerable. It also means I should look again into the option of precompiling enaml files during install.

frmdstryr commented 1 year ago

I'm not sure if I'm doing it correctly but:

Existing parser

from enaml.core.parser import _parser

def lex(source, filename):
    lexer = _parser.lexer(filename)
    lexer.input(source)
    return list(lexer.token_stream)
%timeit [lex(s, f) for f, s in files.items()]

101 ms ± 3.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)def

Using tokenize used in the pegen branch

import io
import tokenizer
from pegen.tokenizer import Tokenizer

def lex(source):
    tok_stream = tokenize.generate_tokens(io.StringIO(source).readline)
    tokenizer = Tokenizer(tok_stream)
    try:
        while tokenizer.getnext():
            pass
    except StopIteration:  
        pass
    return tokenizer._tokens
%timeit [lex(s) for f, s in files.items()]

57.7 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So the lexing part seems to be faster.

MatthieuDartiailh commented 1 year ago

It does not look obviously wrong to me. So once I am done with 3.11 base support I won't lack things to do when it comes to the parser.

frmdstryr commented 1 year ago

Well with the pegen parser, KDevelop can now at least highlight some of the variables properly. :) The current parser does not set the col offsets for any of the python expressions so everything highlights out of wack. image

I would like to get it to somehow check all the attributes and understand the scoping but the ast does not really map directly to python and confuses it.

MatthieuDartiailh commented 1 year ago

I am honestly impressed you get that much to work. Improving tooling around enaml files is something I would like to pursue but I have no idea when.

MatthieuDartiailh commented 1 year ago

Another thing to check would be to regenerate the parser with the main branchbof pegen which includes better handling of error checks and some optimizations around unused variable. I wonder how much it influences performance.

frmdstryr commented 1 year ago

I got KDevelop to also find inherited and parent attributes now.

image

sccolbert commented 1 year ago

That's pretty cool!

This all has me wondering what a next-gen Enaml would look like if we owned the whole stack all the way down to rendering.

On Mon, Nov 7, 2022 at 1:17 PM frmdstryr @.***> wrote:

I got KDevelop to also find inherited and parent attributes now.

[image: image] https://user-images.githubusercontent.com/380158/200393848-d7f7978a-956d-4158-a279-2052e942fe17.png

— Reply to this email directly, view it on GitHub https://github.com/nucleic/enaml/pull/474#issuecomment-1306075658, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABBQSKAIQT2ONKENJHBLG3WHFIVXANCNFSM5MSOKIVA . You are receiving this because you commented.Message ID: @.***>

MatthieuDartiailh commented 1 year ago

One project that reminds me of enaml (even though I did not dive deep enough) is https://slint-ui.com/releases/0.3.1/docs/rust/slint/docs/langref/index.html

MatthieuDartiailh commented 1 year ago

@frmdstryr could detail a bit how you managed to reach that point ? I am wondering if something similar for VS Code could be done.

frmdstryr commented 1 year ago

The code is in https://github.com/frmdstryr/kdev-python/tree/enaml-support . Their parser originally used the c-api but since that was removed in 3.10 I changed it to use the python api and was able to just add the enaml stuff on top.

Edit: If you try it, you'll need the pegen branch or it will crash as the col_offset is missing. It should work on windows but I haven't tried.

codecov-commenter commented 1 year ago

Codecov Report

Merging #474 (bb00f9d) into main (bcfb9b7) will decrease coverage by 1.37%. The diff coverage is 84.23%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #474 +/- ## ========================================== - Coverage 74.59% 73.21% -1.38% ========================================== Files 302 296 -6 Lines 22944 25810 +2866 Branches 2951 3648 +697 ========================================== + Hits 17114 18896 +1782 - Misses 4936 5829 +893 - Partials 894 1085 +191 ```
MatthieuDartiailh commented 1 year ago

@frmdstryr I will try to merge this by the end of the week so as to unlock work on 3.11 support. I will open issues for the remaining pain points (speed, compilation to enamlc at install time, etc) since my current bandwidth won't allow me to get to them in a reasonable time.

frmdstryr commented 1 year ago

Thanks Matthieu!

MatthieuDartiailh commented 1 year ago

With the last fix all exopy tests pass and since our tests pass too I will consider this good enough to merge.