Closed latot closed 1 year ago
I can second this. I've had a project that can have a total of hundreds of thousands, and each one can take several minutes. As such, I added a "resume" functionality, if the process crashes etc., but that causes the estimated time to be much lower than reality because those items are already finished.
Hi @latot, thank you!
I'm not sure I understood what you meant, but let's say your process has 1000 steps. You stop it on 400.
When you restart it, if you gave the same 1000 total for alive_bar
and you quickly make it forward to 400, of course the ETA for the next 600 would be totally wrong since the items would be of very different lengths...
But you could sense that 400 had already been done, and just give 600 as the total. This would make the ETA precise again. Or you could ignore the ETA altogether by just sending stats='({rate})'
.
Regarding your suggestion of telling bar()
to "ignore the time of a particular step", I'm afraid that's not possible since I do not have that timing... I do not store the timings of when the user last called bar()
.
And even if I did include such timing, I'm not sure how I could compute ETA "skipping certain chunks of time". Remember it also uses an Exponential Smoothing Function.
In a nutshell, I think you would really need to pre-compute this "already done" number, and then either:
alive_bar
(initial total minus already done)But as you see, you always need to find this number beforehand, not during alive_bar
's context.
pass the actual total to
alive_bar
(initial total minus already done)
That's an option, but it's not ideal - see below.
I could create a new param like "start_from=400", then I could display the total as 1000, but internally use 600 as the actual, so the ETA would be correct
As you said, that requires finding the number beforehand, which breaks the point of a progress bar.
If we are checking a database for whether it's already done, thent hat could take ages and there would be no progress bar during that time.
I understand it's difficult, but it would definitely be a nice feature.
Hey @TheTechRobo, not at all! You could start an alive_bar
session for this checking too 😉...
I've even already used this in other projects of mine: you start a progress bar to check and pre-compute things, without a final receipt. This will just clean the screen at the end, so then you can start another with the final receipt, which will do the actual processing!
Sadly, we can't always pre-compute, specially in the bar cases, we use bar when we want to process a lot of data. I have a very big pandas table, it need to be read by chunks, can't be loaded to the memory. There is other limits too, not all the libs has a function to run bar
by processed element, So I need to run it with every chunk to get a progress bar.
So, have a progress bar really helps, but with this cases, is more hard to know what will be executed where, or when will be skiped.
Ok, I think I'm seeing this with other eyes now, perhaps it is possible after all...
I do not need to know when the last bar()
took place, all I have to do is consider the internal total
dynamic!
If you sent 1000 as total but then call bar(deduct_eta=True)
, I could set an internal total (invisible to the external world) to 999... So, if you end up skipping the first 400 items, even with a total of 1000 I'd consider 600 to compute the ETA!
Do you think that would work?
I don't know the technical details but that sounds great as long as it makes the ETA more accurate!
Yep, I think it does! I've made some tests and realized it doesn't work changing an internal total as I'd suggested. It still does this:
https://user-images.githubusercontent.com/6652853/213197006-d21cfa13-c8c7-4b62-9148-beb9c8854c55.mov
Note the ~1s ETA: since half of the items were processed SO FAST, the ETA is ~1s through to the end.
And pause the video at 3 seconds, note the rate: the bar thinks it is processing more than 30k items per second!!
So, it seemed I should change the rate
computation, which is (total - count) / elapsed time
. If that "count" wasn't the actual position but the processed items, I could account for skipped ones... And it does seem to work nicely! Look at this:
https://user-images.githubusercontent.com/6652853/213197166-6feea0eb-82f1-423a-9fa9-73cd7f8dfe35.mov
Note how the ETA now quickly adjusts itself to ~8s, and goes down nicely as it should!!
So, it seems this is the way: I have to have an internal "processed items", and use that to compute the rate.
Also, when the user wants, he can bar(skipped=True)
to tell the bar he hasn't processed that item... 😜
Hey @latot, it is ready!!
It was very hard work, even with the draft running, it was very complex to integrate it on the other modes my alive-progress
has. I wanted skipped
to only exist on the bar
handle that can use it, i.e. the definite mode. I needed relative positioning (which both definite and unknown modes have), but also ETA, which requires total
.
Anyway, I'm glad I could find time to make this. The PR is #231, I'll release it soon.
Released 👍
Hi all!, Thx for this great project!
This can be confuse to explain, so I'll start in other point, there is some cases where we put a progress bar, and the progress bar will show the remaining time... just some times, some of this process are resumable, this causes we call
bar
instantly and the timer does wrong calculations to get the estimated time.In this cases we can know when we will skip a particular step, maybe something like:
I think something would be great for this, the idea is that the reaming time ignore that step.
Thx!