Performance tracker: Backtracking Parser

isidentical commented 2 years ago

The previous PRs successfully optimized the backtracking in the factor of 4X. It might make some sense to profile and see whether there are still other potential bottlenecks.

tanvimoharir commented 2 years ago

Hi, I would like to work on this issue. I have committed to this repository in the past but I would need some guidance to get started. Thanks

isidentical commented 2 years ago

You can benchmark the backtracking parser (with passing -t py310) and run the profiler to see the hot spots, if there is anything too obvious that might be the target of the optimization. Here is an example https://github.com/psf/black/pull/2763#issuecomment-1010201541

tanvimoharir commented 2 years ago

Hi @isidentical, Perhaps I'm missing some context, I was assuming that I need to run this with cprofile which is what I did:

bash-3.2$ python3 -m cProfile -s tottime -m src.blib2to3.pgen2.parse
         26469 function calls (25215 primitive calls) in 0.132 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       23    0.046    0.002    0.046    0.002 {method 'read' of '_io.BufferedReader' objects}
       10    0.034    0.003    0.034    0.003 {built-in method _imp.create_dynamic}
       23    0.005    0.000    0.005    0.000 {built-in method marshal.loads}
      157    0.005    0.000    0.005    0.000 {built-in method posix.stat}
    100/6    0.003    0.000    0.008    0.001 sre_parse.py:493(_parse)
       23    0.002    0.000    0.002    0.000 {built-in method io.open_code}
       53    0.002    0.000    0.005    0.000 {built-in method builtins.__build_class__}
       79    0.002    0.000    0.008    0.000 <frozen importlib._bootstrap_external>:1505(find_spec)
    194/6    0.001    0.000    0.003    0.000 sre_compile.py:71(_compile)
     1320    0.001    0.000    0.001    0.000 sre_parse.py:164(__getitem__)

However looking at your comment I'm guessing this isn't what is expected. I looked at the related comment from https://github.com/psf/black/pull/2763#issuecomment-1010201541 Are you using pyperf module? (I have started going through pyperf docs) I don't understand how to pass -t py310 or to be precise, is py310 a benchmark that should be created? Please let me know what I' missing. Thanks again for helping me, appreciate it.

JelleZijlstra commented 2 years ago

@tanvimoharir You're on the right track with using cProfile, but that benchmark merely imports the module, which isn't very interesting. What you'd want to benchmark instead is a run where Black formats some code that is heavy in use of match.

-t py310 is an option to Black that runs Black in Python 3.10 mode.

tanvimoharir commented 2 years ago

@tanvimoharir You're on the right track with using cProfile, but that benchmark merely imports the module, which isn't very interesting. What you'd want to benchmark instead is a run where Black formats some code that is heavy in use of match.

-t py310 is an option to Black that runs Black in Python 3.10 mode.

I tried running with -t py310 on a sample file with some match scenarios:

bash-3.2$ python3 -m cProfile -s tottime -m src.black -t py310 --diff --color cat.py
--- cat.py  2022-01-20 12:49:44.841140 +0000
+++ cat.py  2022-01-20 12:49:49.045038 +0000
would reformat cat.py

All done! ✨ 🍰 ✨
1 file would be reformatted.
         211084 function calls (194385 primitive calls) in 0.942 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      142    0.351    0.002    0.351    0.002 {method 'read' of '_io.BufferedReader' objects}
       25    0.124    0.005    0.131    0.005 {built-in method _imp.create_dynamic}
       14    0.035    0.002    0.035    0.002 {built-in method posix.listdir}
      140    0.034    0.000    0.034    0.000 {built-in method marshal.loads}
      783    0.028    0.000    0.029    0.000 {built-in method posix.stat}
      140    0.017    0.000    0.017    0.000 {built-in method io.open_code}
  398/397    0.014    0.000    0.046    0.000 {built-in method builtins.__build_class__}
   353/67    0.013    0.000    0.035    0.001 sre_parse.py:493(_parse)
    183/1    0.012    0.000    0.943    0.943 {built-in method builtins.exec}
      337    0.008    0.000    0.083    0.000 <frozen importlib._bootstrap_external>:1527(find_spec)
      488    0.007    0.000    0.017    0.000 parse.py:287(_addtoken)
23222/20938    0.007    0.000    0.008    0.000 {built-in method builtins.isinstance}
   620/63    0.006    0.000    0.015    0.000 sre_compile.py:71(_compile)

I will try to add more code and try formatting it.

tanvimoharir commented 2 years ago

I inflated the sample file with more match scenarios (got some code from the test data) with some valid and some invalid scenarios:

bash-3.2$ python3 -m cProfile -s tottime -m src.black -t py310 --diff --color cat.py
error: cannot format cat.py: cannot use --safe with this file; failed to parse source file: expected ':' (<unknown>, line 1888)

Oh no! 💥 💔 💥
1 file would fail to reformat.
         2200056 function calls (2013593 primitive calls) in 6.717 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    16446    3.770    0.000    3.912    0.000 driver.py:92(__next__)
      142    0.357    0.003    0.357    0.003 {method 'read' of '_io.BufferedReader' objects}
    18943    0.282    0.000    0.696    0.000 parse.py:287(_addtoken)
       25    0.114    0.005    0.121    0.005 {built-in method _imp.create_dynamic}
    78767    0.100    0.000    0.205    0.000 parse.py:380(pop)
    88795    0.091    0.000    0.103    0.000 parse.py:368(push)
        1    0.091    0.091    4.931    4.931 driver.py:126(parse_tokens)
    16447    0.083    0.000    0.133    0.000 tokenize.py:409(generate_tokens)
    83079    0.077    0.000    0.154    0.000 pytree.py:477(convert)
25502/2127    0.076    0.000    0.661    0.000 linegen.py:72(visit_default)
    13565    0.063    0.000    0.294    0.000 lines.py:48(append)
    13138    0.050    0.000    0.099    0.000 brackets.py:68(mark)
32154/2127    0.047    0.000    0.662    0.000 nodes.py:159(visit)
16311/15057    0.044    0.000    0.895    0.000 parse.py:239(addtoken)
    14271    0.041    0.000    0.041    0.000 {method 'match' of 're.Pattern' objects}
       14    0.035    0.002    0.035    0.002 {built-in method posix.listdir}
      140    0.034    0.000    0.034    0.000 {built-in method marshal.loads}

psf / black

Performance tracker: Backtracking Parser #2762