Open krackers opened 1 year ago
@thepyedpiper I rewrote pyp to use generators everywhere. https://gist.github.com/krackers/f73486bf2f625b9f39f33298d33b8932
From my hasty testing, everything seems to work, the only part I disabled is using fpp
and spp
since that apparently required knowing size of input apriori (it should still be possible to handle it, I just don't use it myself).
It seems to work quite nicely, and it can do some very cool things like
seq 1 100000 | pyp "pp[-1]"
without blowing up your memory usage. I'm actually surprised it works so well, with relatively few changes.
cool, thanks! I'll check it out...do you have any quantitative data about speed up/memory use? People still use fpp/spp although it's not super commonly applied. I wonder if it's easy to implement an alternative solution when those are employed.
Thanks again!
Toby
On Sat, Aug 26, 2023 at 2:05 AM krackers @.***> wrote:
@thepyedpiper https://github.com/thepyedpiper I rewrote pyp to use generators everywhere. https://gist.github.com/krackers/f73486bf2f625b9f39f33298d33b8932
From my hasty testing, everything seems to work, the only part I disabled is using fpp and spp since that apparently required knowing size of input apriori (it should still be possible to handle it, I just don't use it myself).
It seems to work quite nicely, and it can do some very cool things like
seq 1 100000 | pyp "pp[-1]"
without blowing up your memory usage.
— Reply to this email directly, view it on GitHub https://github.com/thepyedpiper/pyp/issues/5#issuecomment-1694236121, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYBHA76546GRRDP7UJZPKJTXXG36DANCNFSM6AAAAAA32NPWHI . You are receiving this because you were mentioned.Message ID: @.***>
Sure, here's some quantitative data:
Command run: seq 1 1000000 | pyp "pp[-1]"
My version: Memory used: < 20 MB
usr time 3.28 secs 0.15 millis 3.28 secs
sys time 0.04 secs 2.19 millis 0.04 secs
Original version: 1 GB memory used
usr time 6.85 secs 0.13 millis 6.85 secs
sys time 0.28 secs 2.10 millis 0.28 secs
I wonder if it's easy to implement an alternative solution when those are employed.
It should be possible, worst case that can be treated as an array instead of a generator. I just don't use it myself so I haven't implemented it, nor would I know what to test for.
looks great, especially the memory usage!
Let me know if you have any ideas about spp/fpp. I'll take a look as well.
cheers,
t
On Wed, Aug 30, 2023 at 11:34 AM krackers @.***> wrote:
Sure, here's some quantitative data:
Command run: seq 1 1000000 | pyp "pp[-1]"
My version: Memory used: < 20 MB
usr time 3.28 secs 0.15 millis 3.28 secs sys time 0.04 secs 2.19 millis 0.04 secs
Original version: 1 GB memory used
usr time 6.85 secs 0.13 millis 6.85 secs sys time 0.28 secs 2.10 millis 0.28 secs
— Reply to this email directly, view it on GitHub https://github.com/thepyedpiper/pyp/issues/5#issuecomment-1699656416, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYBHA7ZSWGQOEXHUXZJVNFTXX6BRVANCNFSM6AAAAAA32NPWHI . You are receiving this because you were mentioned.Message ID: @.***>
Currently we block to read the entire input before we begin processing it. However, this means that we cannot use it in a streaming fashion
yes | pyp "p"
. So long as the input makes use of onlyp
and notpp
, we should instead prefer to process in a streaming manner. It might also be a good idea to use generators forpp
so that things like getting the last element don't require materializing the entire list in memory.It seems this can be done by moving
process_inputs
inside theprocess_master_switch
, which already has a branch for the two modes.Edit: I also found that by default pyp saves the input to a temp file, used for the rerun feature. In addition to not being compatible with streaming mode, in general it seems this could potentially leak sensitive data. It feels like it would be better to add an explicit option to store to disk (maybe
-w
for--write
?, or otherwise only enable this feature whenpyp
is invoked with a tty stdin, along with a message letting the user know that the input was saved to a temp file.