watermarkhu / textmate-grammar-python

Python lexer and tokenizer based on textmate grammars
https://textmate-grammar-python.readthedocs.io
MIT License
11 stars 2 forks source link

Extremely poor performance on somewhat large matlab files #68

Open apozharski opened 4 months ago

apozharski commented 4 months ago

Hello, While porting sphinx-contrib/matlabdomain I have run into severe performance issues .

An example of a file with extremely poor parsing performance would be: https://github.com/nurkanovic/nosnoc/blob/0caa4509faa7a979da229a8617ae123b9ae02aa5/src/NosnocModel.m

which takes 5 minutes to parse on my machine. Below is the cProfile trace sorted by internal time.

365740104 function calls (339252023 primitive calls) in 303.003 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
8136370/49585   36.358    0.000  300.842    0.006 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:545(_parse)
 16525617   29.903    0.000  151.679    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:217(search)
 17027461   28.075    0.000   28.075    0.000 {built-in method _onigurumacffi.onigcffi_search}
 17027461   27.770    0.000  116.194    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/onigurumacffi.py:129(search)
  7985230   23.077    0.000  119.386    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:273(_parse)
 16525617   22.555    0.000  175.059    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:154(match_and_capture)
17257055/6802   18.700    0.000  302.784    0.045 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/utils/logger.py:17(wrapper)
 17027461   15.719    0.000   22.917    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/onigurumacffi.py:83(_start_params)
 17027461   12.176    0.000   23.123    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/onigurumacffi.py:87(_region)
 17027461   10.600    0.000   12.263    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/onigurumacffi.py:91(_match_ret)
1131845/3192   10.535    0.000  302.690    0.095 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:363(_parse)
 16447937    9.431    0.000   14.193    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/utils/logger.py:125(debug)
 17027461    6.375    0.000    6.375    0.000 {built-in method _onigurumacffi.onig_region_new}
 42394241    5.363    0.000    5.363    0.000 {built-in method builtins.len}
 17284783    5.028    0.000    5.028    0.000 /usr/lib/python3.10/logging/__init__.py:1710(getEffectiveLevel)
 34054922    4.900    0.000    4.900    0.000 {method 'encode' of 'str' objects}
 17027461    4.572    0.000    4.572    0.000 {method 'gc' of '_cffi_backend.FFI' objects}
  4283624    4.247    0.000    6.784    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:181(read)
  9230629    3.900    0.000    5.175    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:73(disabled)
 26515200    3.817    0.000    3.817    0.000 {method 'get' of 'dict' objects}
  1131845    2.746    0.000    6.959    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:380(<listcomp>)
  2781976    2.730    0.000    3.895    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/onigurumacffi.py:66(start)
  4473120    2.237    0.000    2.702    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:68(_check_pos)
   289434    1.953    0.000    2.353    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:106(range)
   289434    1.651    0.000    7.744    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:137(<dictcomp>)
  4336744    1.029    0.000    1.029    0.000 {method 'decode' of 'bytes' objects}
  2326726    1.009    0.000    1.009    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/onigurumacffi.py:48(__init__)
  1065644    0.799    0.000    1.119    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/onigurumacffi.py:69(end)
  2326726    0.653    0.000    0.653    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/onigurumacffi.py:36(_check)
    89998    0.533    0.000    1.496    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:589(<listcomp>)
   289434    0.486    0.000   10.582    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:128(chars)
   657139    0.451    0.000    0.658    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/utils/logger.py:131(info)
   489123    0.409    0.000    0.504    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/onigurumacffi.py:61(group)
  4028091    0.402    0.000    0.402    0.000 {method 'append' of 'list' objects}
   289434    0.344    0.000    0.344    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/elements.py:148(__init__)
  2014281    0.342    0.000    0.342    0.000 {method 'isspace' of 'str' objects}
   270455    0.320    0.000    0.320    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/elements.py:26(__init__)
    61729    0.285    0.000    0.285    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:823(<listcomp>)
    94748    0.251    0.000    0.416    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:139(read_pos)
    75983    0.233    0.000    0.339    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/elements.py:448(__init__)
   391225    0.207    0.000    0.207    0.000 {built-in method builtins.repr}
   179705    0.138    0.000    0.197    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/utils/logger.py:137(warning)
   126893    0.099    0.000    0.099    0.000 {method 'index' of 'list' objects}
   126893    0.091    0.000    0.190    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:659(<lambda>)
    12919    0.090    0.000    0.279    0.000 {built-in method builtins.sorted}
   272694    0.079    0.000    0.079    0.000 {method 'extend' of 'list' objects}
     6663    0.054    0.000    2.558    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/elements.py:72(dispatch)
24664/18001    0.046    0.000    2.622    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/elements.py:122(_dispatch_list)
    32101    0.037    0.000    0.040    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:72(next)
    32920    0.031    0.000    0.107    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/onigurumacffi.py:72(span)
    39214    0.030    0.000    0.030    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:89(prev)
  13457/1    0.023    0.000    2.661    2.661 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/elements.py:192(_dispatch)
     3610    0.020    0.000    0.080    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:215(_parse)
   2272/1    0.015    0.000    2.661    2.661 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/elements.py:502(_dispatch)
    16186    0.013    0.000    0.013    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:170(read_line)
     9676    0.011    0.000    0.011    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:77(__repr__)
    13456    0.009    0.000    0.011    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/elements.py:209(__eq__)
    41079    0.008    0.000    0.008    0.000 {built-in method builtins.isinstance}
     7554    0.006    0.000    0.011    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/onigurumacffi.py:111(number_of_captures)
        1    0.006    0.006    0.007    0.007 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/api.py:33(from_bytes)
     7554    0.005    0.000    0.005    0.000 {built-in method _onigurumacffi.onig_number_of_captures}
     7435    0.004    0.000    0.007    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/elements.py:59(__eq__)
    13603    0.003    0.000    0.003    0.000 {method 'pop' of 'dict' objects}
     1778    0.002    0.000    0.002    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:256(__repr__)
     1462    0.002    0.000    0.002    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:521(__repr__)
     6664    0.001    0.000    0.001    0.000 {method 'items' of 'dict' objects}
        1    0.001    0.001    0.001    0.001 {method 'findall' of 're.Pattern' objects}
      363    0.000    0.000    0.000    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:481(<genexpr>)
        1    0.000    0.000    0.000    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:53(<listcomp>)
        1    0.000    0.000    0.000    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:52(<listcomp>)
       11    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
       28    0.000    0.000    0.000    0.000 {built-in method posix.lstat}
      182    0.000    0.000    0.000    0.000 {built-in method builtins.next}
        4    0.000    0.000    0.000    0.000 /usr/lib/python3.10/posixpath.py:401(_joinrealpath)
       28    0.000    0.000    0.000    0.000 /usr/lib/python3.10/posixpath.py:71(join)
        3    0.000    0.000    0.000    0.000 {method 'replace' of 'str' objects}
        5    0.000    0.000    0.000    0.000 /usr/lib/python3.10/pathlib.py:56(parse_parts)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.max}
        6    0.000    0.000    0.000    0.000 {built-in method posix.stat}
        1    0.000    0.000  303.003  303.003 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parsers/base.py:115(parse_file)
        4    0.000    0.000    0.000    0.000 /usr/lib/python3.10/posixpath.py:338(normpath)
        1    0.000    0.000    0.000    0.000 {built-in method io.open}
       13    0.000    0.000    0.000    0.000 /usr/lib/python3.10/pathlib.py:621(__str__)
        5    0.000    0.000    0.000    0.000 /usr/lib/python3.10/pathlib.py:569(_parse_args)
        1    0.000    0.000    0.000    0.000 {method 'read' of '_io.BufferedReader' objects}
        5    0.000    0.000    0.000    0.000 /usr/lib/python3.10/pathlib.py:589(_from_parts)
        4    0.000    0.000    0.001    0.000 /usr/lib/python3.10/pathlib.py:1064(resolve)
        7    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/utils.py:368(cut_sequence_chunks)
       35    0.000    0.000    0.000    0.000 {built-in method sys.intern}
        1    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/utils.py:268(identify_sig_or_bom)
        8    0.000    0.000    0.000    0.000 /usr/lib/python3.10/posixpath.py:60(isabs)
       36    0.000    0.000    0.000    0.000 /usr/lib/python3.10/posixpath.py:41(_get_sep)
        4    0.000    0.000    0.001    0.000 /usr/lib/python3.10/posixpath.py:392(realpath)
       44    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
       28    0.000    0.000    0.000    0.000 {method 'partition' of 'str' objects}
        5    0.000    0.000    0.000    0.000 /usr/lib/python3.10/pathlib.py:239(splitroot)
        1    0.000    0.000    0.000    0.000 {method '__exit__' of '_io._IOBase' objects}
       53    0.000    0.000    0.000    0.000 {built-in method posix.fspath}
        1    0.000    0.000    0.008    0.008 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:56(from_path)
        3    0.000    0.000    0.000    0.000 /usr/lib/python3.10/logging/__init__.py:1724(isEnabledFor)
        1    0.000    0.000  300.334  300.334 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parsers/base.py:173(_parse)
        5    0.000    0.000    0.000    0.000 /usr/lib/python3.10/pathlib.py:608(_format_parsed_parts)
       11    0.000    0.000    0.000    0.000 /usr/lib/python3.10/pathlib.py:631(__fspath__)
        1    0.000    0.000    0.001    0.001 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/utils.py:215(any_specified_encoding)
        1    0.000    0.000    0.007    0.007 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/api.py:532(from_path)
        4    0.000    0.000    0.000    0.000 /usr/lib/python3.10/posixpath.py:377(abspath)
        1    0.000    0.000    0.001    0.001 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/handler.py:32(__init__)
        1    0.000    0.000    0.000    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/utils/cache.py:79(save)
       28    0.000    0.000    0.000    0.000 {built-in method _stat.S_ISLNK}
        1    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/utils.py:290(iana_name)
        2    0.000    0.000    0.000    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/utils/cache.py:14(_path_to_key)
        1    0.000    0.000    0.000    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/utils/logger.py:75(configure)
       11    0.000    0.000    0.000    0.000 {method 'startswith' of 'bytes' objects}
       28    0.000    0.000    0.000    0.000 {method 'endswith' of 'str' objects}
        6    0.000    0.000    0.000    0.000 /usr/lib/python3.10/pathlib.py:1092(stat)
        1    0.000    0.000    0.007    0.007 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/api.py:502(from_fp)
        1    0.000    0.000  302.994  302.994 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parsers/base.py:161(_parse_language)
        2    0.000    0.000    0.000    0.000 /usr/lib/python3.10/logging/__init__.py:219(_acquireLock)
        1    0.000    0.000  300.334  300.334 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parser.py:129(parse)
        1    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/cd.py:291(merge_coherence_ratios)
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.10/pathlib.py:957(__new__)
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.10/re.py:288(_compile)
        9    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
        1    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/models.py:257(append)
        5    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x5acc750939a0}
        1    0.000    0.000    0.001    0.001 /usr/lib/python3.10/re.py:232(findall)
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.10/pathlib.py:718(suffix)
        1    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/models.py:11(__init__)
        1    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/models.py:237(__getitem__)
        2    0.000    0.000    0.000    0.000 /usr/lib/python3.10/logging/__init__.py:1532(log)
        5    0.000    0.000    0.000    0.000 {method 'lstrip' of 'str' objects}
        2    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/models.py:231(__init__)
        1    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/models.py:197(could_be_from_charset)
        2    0.000    0.000    0.000    0.000 /usr/lib/python3.10/logging/__init__.py:228(_releaseLock)
        1    0.000    0.000    0.000    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/utils/cache.py:58(cache_valid)
        2    0.000    0.000    0.000    0.000 {method 'acquire' of '_thread.RLock' objects}
        1    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.10/pathlib.py:710(name)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.round}
        5    0.000    0.000    0.000    0.000 {method 'reverse' of 'list' objects}
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.10/pathlib.py:1285(exists)
        1    0.000    0.000    0.000    0.000 /home/anton/tools/textmate-grammar-python/src/textmate_grammar/parsers/matlab/__init__.py:51(pre_process)
        2    0.000    0.000    0.000    0.000 /usr/lib/python3.10/logging/__init__.py:1307(disable)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.min}
        1    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/models.py:71(__str__)
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.10/logging/__init__.py:1455(debug)
        1    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/models.py:277(best)
        2    0.000    0.000    0.000    0.000 {method 'release' of '_thread.RLock' objects}
        1    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/models.py:204(<listcomp>)
        1    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/cd.py:305(<listcomp>)
        1    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'rfind' of 'str' objects}
        1    0.000    0.000    0.000    0.000 /home/anton/tools/matlabdomain/venv/lib/python3.10/site-packages/charset_normalizer/models.py:170(raw)
        1    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}