rsalmei / alive-progress

A new kind of Progress Bar, with real-time throughput, ETA, and very cool animations!
MIT License
5.53k stars 206 forks source link

ETA in discrete operations #52

Closed ishaanx closed 4 years ago

ishaanx commented 4 years ago

In manual mode eta is pretty inaccurate in my case. Is there a way to disable the eta shown?

Report 1 |▉▉▉▉ | - 10/100 [10%] in 7s (1.5/s, eta: 2s)

rsalmei commented 4 years ago

Hello @ishaanx,

That's weird, I can't reproduce that. I've created an example with the exact same output as yours, manual=True, total=100 and a throughput of ~1.5/s, look: asd

Well, it seems to work nicely. Do you have any more information? Are you using the latest released version? Could you make a video of it? Anyway, I'd like to fix the inaccurate eta, instead of disabling it.

ishaanx commented 4 years ago

Hi @rsalmei , I believe this is because the way I'm using the bars. I'm manually incrementing the bar by 10 in each step of my code. (Not the best way)

Here is sample code that I was testing on. I'm converting a csv file xlsx using pandas. Input file size is about 8mb with 50k rows.

And the reason why it gets stuck on 2 seconds ETA is because pandas doesn't have a proper 'feedback' in _toexcel? (I read somewhere on stackoverflow)

with alive_bar(total=100, manual=True,title='Report 1',theme='smooth',bar='blocks',spinner='classic') as bar: 

        read_file = pd.read_csv (''r''+ <file_path>,sep="\t")
        bar(.10)
        read_file.to_excel (''r''+<file_path>, index = None, header=True)
        bar(.20)
...
clipped

Let me know your thoughts on this.

rsalmei commented 4 years ago

Humm, I see, you are doing discrete operations, which can have very different times between them. When your operation uses a for loop, usually the times of each step are similar. But with very different steps, you need to improvise.

What I'd suggest is: do not divide the percentages equally between the steps, that's why the ETA is weird. You are actually saying the relative effort of reading the file is the same as processing the "to_excel" function and other functions. The eta is so low because the first 10% (reading the file) took very little time, so that is actually the correct ETA... Maybe the reading of the file is just 1% of the total processing, and the "to_excel" is 20%. Tweak these values and see if the situation does improve.

I have another project that could help you tweak it: https://github.com/rsalmei/about-time Make a test run like this:

from about_time import about_time
with about_time() as t_total:
    t1 = about_time(lambda: pd.read_csv (''r''+ <file_path>,sep="\t"))
    read_file = t1.result
    t2 = about_time(lambda: read_file.to_excel (''r''+<file_path>, index = None, header=True))

Then you'll have the total time and the partial times at hand, just divide each tx by the t_total and you have their actual relative times:

read_percent = t1.duration / t_total.duration
to_excel_percent = t2.duration / t_total.duration

That way you'll improve significantly the alive_bar operation. 👍

ishaanx commented 4 years ago

Thanks for the suggestion. I'll check this solution out!

rsalmei commented 4 years ago

Hello @ishaanx,

Have you had the chance to check this? Can we close this ticket?

Anyway, in the new 2.0 I've included these instructions in the readme, to help others in similar cases!

ishaanx commented 4 years ago

Hi @rsalmei

Thanks for your help! I was able to use your directions to improve the ETA (still polishing it).

We can close this issue!

rsalmei commented 4 years ago

Hey, glad to hear!! Did it work as I suggested? Did you use about-time? I'd like to hear more of your experience, as to improve my new readme 👍

rsalmei commented 4 years ago

Well, please include some more info when you can, even after closed. I'm writing this to the new readme of 2.0, and it would help me improve the suggestion.