Profiling and performance improvements

thepyedpiper / pyp

The Pyed Piper: Python Power At the Prompt

13 stars 0 forks source link

Profiling and performance improvements #6

Open krackers opened 1 year ago

krackers commented 1 year ago

seq 1 235976 | python3 -m cProfile (which pyp) "p | pp[-1]"

shows a lot of time (~20 out of 28 seconds) is spent in string_splitter. Some of this might be unavoidable due to python slowness, but I wonder if there's some cleverness we can do to avoid such heavy performance?

The slowdown is the particular line

split_variables = dict((x, PypList([PypStr(y) for y in split_variables_raw[x]])) for x in  split_variables_raw)

It's not clear to me what exactly it's doing. If that is removed, then we get the following heavy hitters:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   235977    1.415    0.000    7.143    0.000 pyp:1940(process_line)
   235978    0.758    0.000    2.128    0.000 pyp:1561(update_history)
   235982    0.704    0.000    0.724    0.000 {built-in method builtins.eval}
   235977    0.627    0.000    0.743    0.000 pyp:1385(translate_preset_variables)
   235978    0.603    0.000    0.603    0.000 {built-in method maketrans}
  4955517    0.539    0.000    0.539    0.000 pyp:1982(<genexpr>)
   235977    0.363    0.000    0.881    0.000 pyp:1218(string_splitter)
   235977    0.335    0.000    0.817    0.000 pyp:447(__init__)
   235977    0.256    0.000    1.146    0.000 pyp:1484(safe_eval)
   471954    0.253    0.000    0.442    0.000 posixpath.py:100(split)
   471957    0.253    0.000    0.253    0.000 {method 'update' of 'dict' objects}
   235977    0.216    0.000    0.363    0.000 pyp:1204(all_meta_split)
   235979    0.208    0.000    8.508    0.000 pyp:1935(<genexpr>)
  2595764    0.194    0.000    0.194    0.000 {method 'split' of 'str' objects}
   235976    0.175    0.000    0.257    0.000 pyp:1688(format_history)
   235978    0.173    0.000    0.173    0.000 {method 'isatty' of '_io.TextIOWrapper' objects}
   235978    0.148    0.000    0.261    0.000 pyp:1647(flatten_list)

krackers commented 10 months ago

Found a way to reduce time of builtin.eval. Python can precompile string to bytecode, with compile which can be passed to eval. Doing this basically optimizes as much as possible I think, only remaining is translate_preset_variables and string_splitter which is going to be unavoidable. Maybe one could optimize by seeing if the variable is used in the expression first, but I don't really know if that would be any faster.

thepyedpiper commented 10 months ago

Great! Thank you. I’ll check out tomorrow.

On Tue, Jan 9, 2024 at 7:14 PM krackers @.***> wrote:

Found a way to reduce time of builtin.eval. Python can precompile string to bytecode, with compile which can be passed to eval. Doing this basically optimizes as much as possible I think, only remaining is translate_preset_variables and string_splitter which is going to be unavoidable

— Reply to this email directly, view it on GitHub https://github.com/thepyedpiper/pyp/issues/6#issuecomment-1884124835, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYBHA75WL673NCDZYRZT3STYNYBSDAVCNFSM6AAAAAA37OBJQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBUGEZDIOBTGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

thepyedpiper commented 5 days ago

Hi,

Just getting back to this. Sorry, Hollywood was upended for a bit which really affected me.

Do you have a version of pyp with all of your improvements?

I'd love to test it and then maybe incorporate some or all of them into a beta.

what were your final quantitative numbers in terms of speedup?

thanks,

Toby

On Tue, Jan 9, 2024 at 7:14 PM krackers @.***> wrote:

Found a way to reduce time of builtin.eval. Python can precompile string to bytecode, with compile which can be passed to eval. Doing this basically optimizes as much as possible I think, only remaining is translate_preset_variables and string_splitter which is going to be unavoidable

— Reply to this email directly, view it on GitHub https://github.com/thepyedpiper/pyp/issues/6#issuecomment-1884124835, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYBHA75WL673NCDZYRZT3STYNYBSDAVCNFSM6AAAAAA37OBJQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBUGEZDIOBTGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

krackers commented 5 days ago

I've uploaded the version I use at https://gist.github.com/krackers/3443b5e175096ba5e790360353a864cc

Note that since it's the version I use personally it's got some other fixes/changes as well, and is suited to my needs/tastes (e.g. look at the other issues I opened in this repo, some of these are probably API breaking – especially the switch to generators for pp).

If I remember correctly, this issue in particular came from two observations: we were splitting unnecessarily even the results were never going to be used, and the repeated builtin.eval. "Fixing" the former technically requires losing some features, while the latter can be fixed without any loss in features.

As for benchmark, re-running the same command as in opening post

seq 1 235976 | python3 -m cProfile -s tottime $(which pyp) "p | pp[-1]"

i get

235976
         10401075 function calls (10400749 primitive calls) in 2.735 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   235977    0.576    0.000    0.976    0.000 pyp:1843(process_line)
   235976    0.331    0.000    0.985    0.000 pyp:443(__init__)
   471952    0.226    0.000    0.383    0.000 posixpath.py:100(split)
   235978    0.126    0.000    0.162    0.000 pyp:1484(update_history)
   235976    0.118    0.000    0.212    0.000 pyp:1597(format_history)
   235976    0.095    0.000    0.095    0.000 pyp:134(__init__)
   235976    0.089    0.000    0.169    0.000 pyp:539(split)
   235977    0.085    0.000    0.096    0.000 pyp:1416(safe_eval)
   235977    0.083    0.000    1.251    0.000 pyp:2015(<genexpr>)
   235979    0.077    0.000    2.304    0.000 pyp:1838(<genexpr>)
        2    0.068    0.034    2.718    1.359 more.py:2437(_islice_helper)
   235976    0.064    0.000    0.083    0.000 pyp:1990(strip_last_newline)
   235976    0.062    0.000    0.062    0.000 pyp:741(__init__)
   235977    0.061    0.000    2.634    0.000 pyp:1615(<genexpr>)
   235976    0.059    0.000    0.086    0.000 pyp:751(__getitem__)

But again this is with basically a mix of changes included and I don't really want to go through and create a clean PR for just the compile change. Hopefully you can piece together something out of the above though.

thepyedpiper commented 5 days ago

awesome, thank you for this. This is what I need. I can piece it together for sure.

Do you happen to know what the stock pyp gives for the same test?

On Thu, Nov 14, 2024 at 11:12 AM krackers @.***> wrote:

I've uploaded the version I use at https://gist.github.com/krackers/3443b5e175096ba5e790360353a864cc

Note that since it's the version I use it's got some other fixes/changes as well (for the other issues I opened), but some of these are probably API breaking (especially the switch to generators for pp). Note that if I remember correctly, this issue in particular came from two observations: we were splitting unnecessarily even the results were never going to be used, and the repeated builtin.eval. "Fixing" the former technically requires losing some features, while the latter can be fixed without any loss in features.

As for benchmark, re-running the same command as in opening post

seq 1 235976 | python3 -m cProfile -s tottime $(which pyp) "p | pp[-1]"

i get

235976 10401075 function calls (10400749 primitive calls) in 2.735 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 235977 0.576 0.000 0.976 0.000 pyp:1843(process_line) 235976 0.331 0.000 0.985 0.000 pyp:443(init) 471952 0.226 0.000 0.383 0.000 posixpath.py:100(split) 235978 0.126 0.000 0.162 0.000 pyp:1484(update_history) 235976 0.118 0.000 0.212 0.000 pyp:1597(format_history) 235976 0.095 0.000 0.095 0.000 pyp:134(init) 235976 0.089 0.000 0.169 0.000 pyp:539(split) 235977 0.085 0.000 0.096 0.000 pyp:1416(safe_eval) 235977 0.083 0.000 1.251 0.000 pyp:2015() 235979 0.077 0.000 2.304 0.000 pyp:1838() 2 0.068 0.034 2.718 1.359 more.py:2437(_islice_helper) 235976 0.064 0.000 0.083 0.000 pyp:1990(strip_last_newline) 235976 0.062 0.000 0.062 0.000 pyp:741(init) 235977 0.061 0.000 2.634 0.000 pyp:1615() 235976 0.059 0.000 0.086 0.000 pyp:751(getitem)

But again this is with basically a mix of changes included and I don't really want to go through and create a clean PR for just the compile change. Hopefully you can piece together something out of the above though.

— Reply to this email directly, view it on GitHub https://github.com/thepyedpiper/pyp/issues/6#issuecomment-2477214524, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYBHA77FFSICVBIWTZ56YHT2ATYZ7AVCNFSM6AAAAABRZS3O2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZXGIYTINJSGQ . You are receiving this because you commented.Message ID: @.***>

krackers commented 5 days ago

Sure, tested pyp 3.0.9 source straight from pypi, and gives the result

> seq 1 235976 | python3 -m cProfile -s tottime $(which pyp) "p | pp[-1]"

235976
         98304563 function calls (98304330 primitive calls) in 40.519 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  4955517    8.215    0.000   16.163    0.000 pyp:407(__init__)
   236005    5.113    0.000    5.113    0.000 {method 'pop' of 'list' objects}
  9911034    4.359    0.000    7.222    0.000 posixpath.py:100(split)
   235979    4.267    0.000    4.289    0.000 {built-in method builtins.eval}
  4955517    1.833    0.000   19.723    0.000 pyp:1202(<genexpr>)
        3    1.609    0.536    1.652    0.551 pyp:1604(stamp_history)
   235977    1.336    0.000   36.755    0.000 pyp:1882(process_line)
  4719540    1.265    0.000   16.715    0.000 pyp:1202(<listcomp>)
  4719540    1.174    0.000    1.174    0.000 pyp:686(__init__)
   235982    1.148    0.000    1.148    0.000 {built-in method builtins.dir}
   235977    1.076    0.000   21.376    0.000 pyp:1157(string_splitter)
  9911041    1.037    0.000    1.559    0.000 posixpath.py:41(_get_sep)
  9911066    0.975    0.000    0.975    0.000 {method 'rfind' of 'str' objects}
   235977    0.657    0.000    0.910    0.000 pyp:1325(translate_preset_variables)
  7315301    0.623    0.000    0.623    0.000 {method 'split' of 'str' objects}
   235977    0.606    0.000    2.367    0.000 pyp:1504(update_history)
        2    0.576    0.288    2.306    1.153 pyp:1805(format_input)
  4955496    0.568    0.000    0.568    0.000 pyp:1912(<genexpr>)
  9911670    0.522    0.000    0.522    0.000 {built-in method builtins.isinstance}
   707933    0.330    0.000    0.330    0.000 {method 'update' of 'dict' objects}
  9911102    0.329    0.000    0.329    0.000 {built-in method posix.fspath}
  4956618    0.287    0.000    0.287    0.000 {method 'rstrip' of 'str' objects}
   235977    0.259    0.000    4.752    0.000 pyp:1429(safe_eval)

thepyedpiper commented 5 days ago

fantastic!

thanks

On Thu, Nov 14, 2024 at 11:29 AM krackers @.***> wrote:

Sure, tested pyp 3.0.9 source straight from pypi, and gives the result

seq 1 235976 | python3 -m cProfile -s tottime $(which pyp) "p | pp[-1]"

235976 98304563 function calls (98304330 primitive calls) in 40.519 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 4955517 8.215 0.000 16.163 0.000 pyp:407(init) 236005 5.113 0.000 5.113 0.000 {method 'pop' of 'list' objects} 9911034 4.359 0.000 7.222 0.000 posixpath.py:100(split) 235979 4.267 0.000 4.289 0.000 {built-in method builtins.eval} 4955517 1.833 0.000 19.723 0.000 pyp:1202() 3 1.609 0.536 1.652 0.551 pyp:1604(stamp_history) 235977 1.336 0.000 36.755 0.000 pyp:1882(process_line) 4719540 1.265 0.000 16.715 0.000 pyp:1202() 4719540 1.174 0.000 1.174 0.000 pyp:686(init) 235982 1.148 0.000 1.148 0.000 {built-in method builtins.dir} 235977 1.076 0.000 21.376 0.000 pyp:1157(string_splitter) 9911041 1.037 0.000 1.559 0.000 posixpath.py:41(_get_sep) 9911066 0.975 0.000 0.975 0.000 {method 'rfind' of 'str' objects} 235977 0.657 0.000 0.910 0.000 pyp:1325(translate_preset_variables) 7315301 0.623 0.000 0.623 0.000 {method 'split' of 'str' objects} 235977 0.606 0.000 2.367 0.000 pyp:1504(update_history) 2 0.576 0.288 2.306 1.153 pyp:1805(format_input) 4955496 0.568 0.000 0.568 0.000 pyp:1912() 9911670 0.522 0.000 0.522 0.000 {built-in method builtins.isinstance} 707933 0.330 0.000 0.330 0.000 {method 'update' of 'dict' objects} 9911102 0.329 0.000 0.329 0.000 {built-in method posix.fspath} 4956618 0.287 0.000 0.287 0.000 {method 'rstrip' of 'str' objects} 235977 0.259 0.000 4.752 0.000 pyp:1429(safe_eval)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

thepyedpiper commented 5 days ago

Hi again,

I've been playing around with this! Pretty cool, much bigger code changes than I expected! Nice sophisticated iteration work. How long did this take you?

One thing I found is that the powerPyp lists no longer work...for example, pp.sort() doesn't work...this should take the whole input and treat it like a line by line array, so it would give a result similar to standard UNIX sort.

pp also has an "analysis" mode, so just a plain pp would return the index per line so you can pick out particular lines easily...for example:

[0]USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND [1]_windowserver 149 40.5 0.7 410221616 62496 ?? Rs Thu04PM 1046:55.26 /System/Library/PrivateFrameworks/SkyLight.framework/Resources/WindowServer -daemon [2]tobyrosen 606 9.7 0.0 408400832 2528 ?? S Thu04PM 923:56.38 /System/Library/Frameworks/CoreMIDI.framework/MIDIServer

Was this working maybe in an earlier version for you?

cheers,

Toby

On Thu, Nov 14, 2024 at 11:45 AM tobs @.***> wrote:

fantastic!

thanks

t

On Thu, Nov 14, 2024 at 11:29 AM krackers @.***> wrote:

Sure, tested pyp 3.0.9 source straight from pypi, and gives the result

seq 1 235976 | python3 -m cProfile -s tottime $(which pyp) "p | pp[-1]"

235976 98304563 function calls (98304330 primitive calls) in 40.519 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 4955517 8.215 0.000 16.163 0.000 pyp:407(init) 236005 5.113 0.000 5.113 0.000 {method 'pop' of 'list' objects} 9911034 4.359 0.000 7.222 0.000 posixpath.py:100(split) 235979 4.267 0.000 4.289 0.000 {built-in method builtins.eval} 4955517 1.833 0.000 19.723 0.000 pyp:1202() 3 1.609 0.536 1.652 0.551 pyp:1604(stamp_history) 235977 1.336 0.000 36.755 0.000 pyp:1882(process_line) 4719540 1.265 0.000 16.715 0.000 pyp:1202() 4719540 1.174 0.000 1.174 0.000 pyp:686(init) 235982 1.148 0.000 1.148 0.000 {built-in method builtins.dir} 235977 1.076 0.000 21.376 0.000 pyp:1157(string_splitter) 9911041 1.037 0.000 1.559 0.000 posixpath.py:41(_get_sep) 9911066 0.975 0.000 0.975 0.000 {method 'rfind' of 'str' objects} 235977 0.657 0.000 0.910 0.000 pyp:1325(translate_preset_variables) 7315301 0.623 0.000 0.623 0.000 {method 'split' of 'str' objects} 235977 0.606 0.000 2.367 0.000 pyp:1504(update_history) 2 0.576 0.288 2.306 1.153 pyp:1805(format_input) 4955496 0.568 0.000 0.568 0.000 pyp:1912() 9911670 0.522 0.000 0.522 0.000 {built-in method builtins.isinstance} 707933 0.330 0.000 0.330 0.000 {method 'update' of 'dict' objects} 9911102 0.329 0.000 0.329 0.000 {built-in method posix.fspath} 4956618 0.287 0.000 0.287 0.000 {method 'rstrip' of 'str' objects} 235977 0.259 0.000 4.752 0.000 pyp:1429(safe_eval)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

krackers commented 4 days ago

How long did this take you?

I think I was noodling with it around last Christmas? Didn't spend too much time on it, just enough so that I could use it as an awk replacement without blowing up memory. Some of the nicher feature of pyp are probably broken by all the generator stuff (e.g. I didn't test too hard whether things like history still work), but to me it is an acceptable tradeoff.

pp.sort() doesn't work...this should take the whole input and treat it like a line by line array

Hm this is a good catch. Explicitly converting to list like

seq 10 1 | pyp "sorted(list(pp))"

works, but even then below does not

seq 10 1 | pyp "list(pp).sort()"

(actually python's sorted accepts generators so sorted(pp) is sufficient). I'd need to look through the code again to see why this is, I'm guessing it's something to do with the fact that since we no longer operate in-place on a pre-materialized pp array, there's no way to implicitly "pass" the output of an in-place sort onto the next stage: you always need to explicitly return something. Lazy hack to make it work is just define .sort(), .reverse() operations on PowerPipeList that are equivalent to sorted(self)/etc.

pp also has an "analysis" mode, so just a plain pp would return the index per line so you can pick out particular lines easily...for example:

So I do know that the analysis feature works for arrays element wise

seq 10 1 | pyp "[p, p, p]"

I don't know what i did that broke it in pp mode. It should definitely be possible to get it working since adding the indices is an easy operation on generators. Probably the code near

        for idx, out in enumerate(output):
            self.n = idx
            ignore, result = self.update_history(out,second_stream_input,file_input , None, return_output = True)
            if not ignore:
                print(result)

would need to be updated, you already have access to idx there, it's just a matter of formatting it properly on powerpipe output.

thepyedpiper commented 3 days ago

Ok, cool. Thanks for that info. I'll take a look at it and hopefully it's not too bad. The powerpipe setup is critical because I have a bunch of cool methods overloaded for list functions and it simplifies operations that work on the entire output by having a dedicated list variable. That's great you found that analysis mode works for arrays element wise...this implies that the powerpipes will work, but just are not being detected properly. I'll take a look. Let me know if you have any other insights.

cheers,

Toby

PS this is pp info from the manual if interested. Much of this was to replace common unix commands we were using constantly at Sony at the time.

pp LIST (all LIST methods like pp.sort(), pp[-1], and pp.reverse() work

pp.before(STRING, N) searches for STRING, colsolidates N lines BEFORE it to the same line. Default is 1
pp.after(STRING, N) searches for STRING, colsolidates N lines AFTER it to same line. Default is 1.
pp.matrix(STRING, N) returns pp.before(STRING, N) and pp.after(STRING, N). Default is 1.
pp.uniq() returns only unique elements
pp.unlist() breaks up ALL arrays up into seperate single lines
pp.oneline() combines all list elements to one line with spaces

On Fri, Nov 15, 2024 at 7:45 PM krackers @.***> wrote:

How long did this take you?

I think I was noodling with it around last Christmas? Didn't spend too much time on it, just enough so that I could use it as an awk replacement without blowing up memory.

pp.sort() doesn't work...this should take the whole input and treat it like a line by line array

Hm this is a good catch. Explicitly converting to list like

seq 10 1 | pyp "sorted(list(pp))"

works, but even then

seq 10 1 | pyp "list(pp).sort()"

(actually python's sorted accepts generators so sorted(pp) is sufficient). I'd need to look through the code again to see why this is, I'm guessing it's something to do with the fact that since we no longer operate in-place on a pre-materialized pp array, there's no way to implicitly "pass" the output of an in-place sort onto the next stage: you always need to explicitly return something.

pp also has an "analysis" mode, so just a plain pp would return the index per line so you can pick out particular lines easily...for example:

So I do know that the analysis feature works for arrays element wise

seq 10 1 | pyp "[p, p, p]"

I don't know what i did that broke it in pp mode. It should definitely be possible to get it working since adding the indices is an easy operation on generators. Probably the code near
    for idx, out in enumerate(output):
        self.n = idx
        ignore, result = self.update_history(out,second_stream_input,file_input , None, return_output = True)
        if not ignore:
            print(result)
would need to be updated, you already have access to idx there, it's just a matter of formatting it properly on powerpipe output.

— Reply to this email directly, view it on GitHub https://github.com/thepyedpiper/pyp/issues/6#issuecomment-2480375356, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYBHA72FIVTEOQQDMBR2FYL2A25UFAVCNFSM6AAAAABRZS3O2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBQGM3TKMZVGY . You are receiving this because you commented.Message ID: @.***>