Open isidentical opened 2 years ago
Hi, I would like to work on this issue. I have committed to this repository in the past but I would need some guidance to get started. Thanks
You can benchmark the backtracking parser (with passing -t py310
) and run the profiler to see the hot spots, if there is anything too obvious that might be the target of the optimization. Here is an example
https://github.com/psf/black/pull/2763#issuecomment-1010201541
Hi @isidentical, Perhaps I'm missing some context, I was assuming that I need to run this with cprofile which is what I did:
bash-3.2$ python3 -m cProfile -s tottime -m src.blib2to3.pgen2.parse
26469 function calls (25215 primitive calls) in 0.132 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
23 0.046 0.002 0.046 0.002 {method 'read' of '_io.BufferedReader' objects}
10 0.034 0.003 0.034 0.003 {built-in method _imp.create_dynamic}
23 0.005 0.000 0.005 0.000 {built-in method marshal.loads}
157 0.005 0.000 0.005 0.000 {built-in method posix.stat}
100/6 0.003 0.000 0.008 0.001 sre_parse.py:493(_parse)
23 0.002 0.000 0.002 0.000 {built-in method io.open_code}
53 0.002 0.000 0.005 0.000 {built-in method builtins.__build_class__}
79 0.002 0.000 0.008 0.000 <frozen importlib._bootstrap_external>:1505(find_spec)
194/6 0.001 0.000 0.003 0.000 sre_compile.py:71(_compile)
1320 0.001 0.000 0.001 0.000 sre_parse.py:164(__getitem__)
However looking at your comment I'm guessing this isn't what is expected.
I looked at the related comment from https://github.com/psf/black/pull/2763#issuecomment-1010201541
Are you using pyperf module? (I have started going through pyperf docs)
I don't understand how to pass -t py310
or to be precise, is py310 a benchmark that should be created?
Please let me know what I' missing.
Thanks again for helping me, appreciate it.
@tanvimoharir You're on the right track with using cProfile, but that benchmark merely imports the module, which isn't very interesting. What you'd want to benchmark instead is a run where Black formats some code that is heavy in use of match
.
-t py310
is an option to Black that runs Black in Python 3.10 mode.
@tanvimoharir You're on the right track with using cProfile, but that benchmark merely imports the module, which isn't very interesting. What you'd want to benchmark instead is a run where Black formats some code that is heavy in use of
match
.
-t py310
is an option to Black that runs Black in Python 3.10 mode.
I tried running with -t py310 on a sample file with some match
scenarios:
bash-3.2$ python3 -m cProfile -s tottime -m src.black -t py310 --diff --color cat.py
--- cat.py 2022-01-20 12:49:44.841140 +0000
+++ cat.py 2022-01-20 12:49:49.045038 +0000
would reformat cat.py
All done! ✨ 🍰 ✨
1 file would be reformatted.
211084 function calls (194385 primitive calls) in 0.942 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
142 0.351 0.002 0.351 0.002 {method 'read' of '_io.BufferedReader' objects}
25 0.124 0.005 0.131 0.005 {built-in method _imp.create_dynamic}
14 0.035 0.002 0.035 0.002 {built-in method posix.listdir}
140 0.034 0.000 0.034 0.000 {built-in method marshal.loads}
783 0.028 0.000 0.029 0.000 {built-in method posix.stat}
140 0.017 0.000 0.017 0.000 {built-in method io.open_code}
398/397 0.014 0.000 0.046 0.000 {built-in method builtins.__build_class__}
353/67 0.013 0.000 0.035 0.001 sre_parse.py:493(_parse)
183/1 0.012 0.000 0.943 0.943 {built-in method builtins.exec}
337 0.008 0.000 0.083 0.000 <frozen importlib._bootstrap_external>:1527(find_spec)
488 0.007 0.000 0.017 0.000 parse.py:287(_addtoken)
23222/20938 0.007 0.000 0.008 0.000 {built-in method builtins.isinstance}
620/63 0.006 0.000 0.015 0.000 sre_compile.py:71(_compile)
I will try to add more code and try formatting it.
I inflated the sample file with more match
scenarios (got some code from the test data) with some valid and some invalid scenarios:
bash-3.2$ python3 -m cProfile -s tottime -m src.black -t py310 --diff --color cat.py
error: cannot format cat.py: cannot use --safe with this file; failed to parse source file: expected ':' (<unknown>, line 1888)
Oh no! 💥 💔 💥
1 file would fail to reformat.
2200056 function calls (2013593 primitive calls) in 6.717 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
16446 3.770 0.000 3.912 0.000 driver.py:92(__next__)
142 0.357 0.003 0.357 0.003 {method 'read' of '_io.BufferedReader' objects}
18943 0.282 0.000 0.696 0.000 parse.py:287(_addtoken)
25 0.114 0.005 0.121 0.005 {built-in method _imp.create_dynamic}
78767 0.100 0.000 0.205 0.000 parse.py:380(pop)
88795 0.091 0.000 0.103 0.000 parse.py:368(push)
1 0.091 0.091 4.931 4.931 driver.py:126(parse_tokens)
16447 0.083 0.000 0.133 0.000 tokenize.py:409(generate_tokens)
83079 0.077 0.000 0.154 0.000 pytree.py:477(convert)
25502/2127 0.076 0.000 0.661 0.000 linegen.py:72(visit_default)
13565 0.063 0.000 0.294 0.000 lines.py:48(append)
13138 0.050 0.000 0.099 0.000 brackets.py:68(mark)
32154/2127 0.047 0.000 0.662 0.000 nodes.py:159(visit)
16311/15057 0.044 0.000 0.895 0.000 parse.py:239(addtoken)
14271 0.041 0.000 0.041 0.000 {method 'match' of 're.Pattern' objects}
14 0.035 0.002 0.035 0.002 {built-in method posix.listdir}
140 0.034 0.000 0.034 0.000 {built-in method marshal.loads}
The previous PRs successfully optimized the backtracking in the factor of 4X. It might make some sense to profile and see whether there are still other potential bottlenecks.