thepyedpiper / pyp

The Pyed Piper: Python Power At the Prompt
13 stars 0 forks source link

Process input in streaming mode #5

Open krackers opened 1 year ago

krackers commented 1 year ago

Currently we block to read the entire input before we begin processing it. However, this means that we cannot use it in a streaming fashion yes | pyp "p". So long as the input makes use of only p and not pp, we should instead prefer to process in a streaming manner. It might also be a good idea to use generators for pp so that things like getting the last element don't require materializing the entire list in memory.

It seems this can be done by moving process_inputs inside the process_master_switch, which already has a branch for the two modes.

Edit: I also found that by default pyp saves the input to a temp file, used for the rerun feature. In addition to not being compatible with streaming mode, in general it seems this could potentially leak sensitive data. It feels like it would be better to add an explicit option to store to disk (maybe -w for --write?, or otherwise only enable this feature when pyp is invoked with a tty stdin, along with a message letting the user know that the input was saved to a temp file.

krackers commented 1 year ago

@thepyedpiper I rewrote pyp to use generators everywhere. https://gist.github.com/krackers/f73486bf2f625b9f39f33298d33b8932

From my hasty testing, everything seems to work, the only part I disabled is using fpp and spp since that apparently required knowing size of input apriori (it should still be possible to handle it, I just don't use it myself).

It seems to work quite nicely, and it can do some very cool things like

seq 1 100000 | pyp "pp[-1]"

without blowing up your memory usage. I'm actually surprised it works so well, with relatively few changes.

thepyedpiper commented 1 year ago

cool, thanks! I'll check it out...do you have any quantitative data about speed up/memory use? People still use fpp/spp although it's not super commonly applied. I wonder if it's easy to implement an alternative solution when those are employed.

Thanks again!

Toby

On Sat, Aug 26, 2023 at 2:05 AM krackers @.***> wrote:

@thepyedpiper https://github.com/thepyedpiper I rewrote pyp to use generators everywhere. https://gist.github.com/krackers/f73486bf2f625b9f39f33298d33b8932

From my hasty testing, everything seems to work, the only part I disabled is using fpp and spp since that apparently required knowing size of input apriori (it should still be possible to handle it, I just don't use it myself).

It seems to work quite nicely, and it can do some very cool things like

seq 1 100000 | pyp "pp[-1]"

without blowing up your memory usage.

— Reply to this email directly, view it on GitHub https://github.com/thepyedpiper/pyp/issues/5#issuecomment-1694236121, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYBHA76546GRRDP7UJZPKJTXXG36DANCNFSM6AAAAAA32NPWHI . You are receiving this because you were mentioned.Message ID: @.***>

krackers commented 1 year ago

Sure, here's some quantitative data:

Command run: seq 1 1000000 | pyp "pp[-1]"

My version: Memory used: < 20 MB

   usr time    3.28 secs    0.15 millis    3.28 secs
   sys time    0.04 secs    2.19 millis    0.04 secs

Original version: 1 GB memory used

 usr time    6.85 secs    0.13 millis    6.85 secs
   sys time    0.28 secs    2.10 millis    0.28 secs

I wonder if it's easy to implement an alternative solution when those are employed.

It should be possible, worst case that can be treated as an array instead of a generator. I just don't use it myself so I haven't implemented it, nor would I know what to test for.

thepyedpiper commented 1 year ago

looks great, especially the memory usage!

Let me know if you have any ideas about spp/fpp. I'll take a look as well.

cheers,

t

On Wed, Aug 30, 2023 at 11:34 AM krackers @.***> wrote:

Sure, here's some quantitative data:

Command run: seq 1 1000000 | pyp "pp[-1]"

My version: Memory used: < 20 MB

usr time 3.28 secs 0.15 millis 3.28 secs sys time 0.04 secs 2.19 millis 0.04 secs

Original version: 1 GB memory used

usr time 6.85 secs 0.13 millis 6.85 secs sys time 0.28 secs 2.10 millis 0.28 secs

— Reply to this email directly, view it on GitHub https://github.com/thepyedpiper/pyp/issues/5#issuecomment-1699656416, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYBHA7ZSWGQOEXHUXZJVNFTXX6BRVANCNFSM6AAAAAA32NPWHI . You are receiving this because you were mentioned.Message ID: @.***>