rasbt / pyprind

PyPrind - Python Progress Indicator Utility
BSD 3-Clause "New" or "Revised" License
549 stars 65 forks source link

Support for progress bar based on arbitrary position (e.g. file position) #13

Open victorhooi opened 9 years ago

victorhooi commented 9 years ago

I'm trying to use pyprint to add a progress bar for processing a large textfile.

My understanding is that that pypring needs the total number of loop iterations before-hand, in order to calculate the progress bars.

However, if you just have a large textfile, there is no way of telling how many lines there are without actually reading through the whole file.

It would be nice if there was some way of setting a progress bar, such that you could give it a total file-size, then update it with the current progress through the file - e.g.

import pyprind
import os
import time
n = os.path.getsize('queries.txt')

bar = pyprind.ProgBar(n)
with open('queries.txt', 'r') as f:
    for line in f:
        time.sleep(0.5)
        # do some computation
        # Use f.tell() for current position - pass to bar, somehow
        bar.update()
rasbt commented 9 years ago

Hi, Victor, thanks for the request, that sounds useful. Would this solve your problem ?

num_lines = int(subprocess.check_output(['wc', '-l', 'path to file']).decode('utf8').split()[0])

I think getting the number of lines via the Unix/Linux 'wc -l' tool is probably the most efficient way imho. Maybe sth like this could be added as additional "utils" to pyprind; it just needs to be tested on Windows

victorhooi commented 9 years ago

Is there any way of using the number of bytes instead of lines?

The reason is that for lines, you need to read sequentially through the entire file (even if using wc). In my case this is several GB each time/file.

Whereas for bytes, you can get the total size instantly, and file.tell() will also tell you your position.

rasbt commented 9 years ago

I think that's a great idea! I have sth similar implemented here for reading a specific file format.

Right now, I have a book to finish, but in August, I will have more time for tinkering and could try to implement sth. like that. Or if you'd want to do it, you are very very welcome to submit a pull request!

One question is how to implement it Interface-wise. I don't want to make too many changes to the current usage since it may be annoying if it breaks people's code. So I am thinking to either pass a file buffer to to the iterations argument in ProgBar/ProgPercent or alternatively have a new class ProgReadfile with an argument kind to toggle between "percent" or "bar" indicator.

Maybe @DevMoore94 would be interested to implement sth. like that?

rasbt commented 9 years ago

Plus a separate ProgReadfile class would have the advantage of having less "if-else" checks, which are potentially slowing done the code if the number of iterations is large.