nedbat / cog

Small bits of Python computation for static files
MIT License
340 stars 26 forks source link

Handling of old output slows down cog #29

Closed gavriil-deshaw closed 6 months ago

gavriil-deshaw commented 6 months ago

Hi,

It seems that cog is quite slow for large files, especially when the -r flag is not used. For example, for an ~80k LoC (including cog output) file, cog -xc -o /path/to/output/file /path/to/input/file is taking ~40s whereas cog -rc /path/to/input/file is taking ~10s.

~There are 2 distinct issues here. The first one being that cog -xc takes 4x the time cog -rc takes. If we change cog to always use an io.StringIO buffer to read the input file into, as it does when the -r flag is used, then the performance of cog -xc would match that of cog -rc.~

~The second issue is cog taking \~10s, even when not running the generator.~ If, instead of concatenating each line of the previous cog output to a string, we append each line to a list of strings then we would see a significant performance improvement. For the same file as the initial example, cog -xc would take ~0.150s and cog -rc would take ~0.300s.

I have already made the proposed change~s~, since ~they were~ it was simple enough, and I'll be creating a PR shortly. However, I'm opening this issue in case you feel a deeper discussion is needed. Looking forward to hearing your thoughts!

Best, Panayiotis