wolph / python-progressbar

Progressbar 2 - A progress bar for Python 2 and Python 3 - "pip install progressbar2"
http://progressbar-2.readthedocs.org/en/latest/
BSD 3-Clause "New" or "Revised" License
857 stars 101 forks source link

stdout isnt doesn't unwraped when interupted #212

Open ghost opened 5 years ago

ghost commented 5 years ago

When redirect_stdout is beign used, and the iterator is interrupted, for example by a KeyboardInterrupt, stdout remains wrapped,

Example code

import progressbar 
import time 
for i in progressbar.progressbar(range(100), redirect_stdout=True): 
    print('Some text', i) 
    time.sleep(0.1) 
    if i == 50:
        raise KeyboardInterrupt

You'll end up with the 50% completed progress bar sitting on your terminal and refusing to leave.

When an exception stops the progress of the iterator, the progress bar doesn't take notice and unwrap stdout, as it does when finished() is called.

Manually wrapping the iteration in a try:except: works, but obviously isn't very nice

import progressbar 
import time 
bar = progressbar.bar.ProgressBar(redirect_stdout=True) 
try:
    for i in bar(range(100)): 
        print('Some text', i) 
        time.sleep(0.1) 
        if i==50: 
            raise KeyboardInterrupt 
except:
    bar.finish(dirty=True) 
    raise

Versions

Python 3.7.3, IPython 7.8.0, Linux, progressbar version 3.47.0

wolph commented 5 years ago

Unfortunately, as far as I am aware, the progressbar can't detect these exceptions since they don't happen in code managed by the progressbar.

An alternative that does work properly is by using the with statement. The with statement was specifically designed for cases like these:

import progressbar 
import time 

with progressbar.ProgressBar(redirect_stdout=True) as bar:
    for i in bar(range(100)): 
        print('Some text', i) 
        time.sleep(0.02) 
        if i == 50:
            raise KeyboardInterrupt
Neraste commented 5 years ago

I ran with the same problem with stderr and using progressbar.streams.wrap_stderr. I did not know that using the progress bar as a context manager would prevent this problem (so I created a context manager wrapper myself). I could not find this feature in the doc, I think it would be nice to mention it somewhere, don't you think?

ghost commented 5 years ago

@WoLpH Thanks for pointing out the context manager solution, In my case, I'm handing around a lot of iterators and I'd like to selectively wrap some in progress bars, so the context manager would be inconvenient to use.

I think that the iterator version could also clean up after itself, by ensuring that finished() is called when a general exceptions would stop the iterator.

import progressbar 
import time 
import types

def __next__(self):
    try:
        value = next(self._iterable)
        if self.start_time is None:
            self.start()
        else:
            self.update(self.value + 1)
        return value
    except StopIteration:
        self.finish()
        raise
    except Exception:
        self.finish(dirty=True)
        raise

def __exit__(self, exc_type, exc_value, traceback):
        self.finish(dirty=(exc_type is None))

bar = progressbar.bar.ProgressBar(redirect_stdout=True) 
bar.__next__ = types.MethodType(__next__, bar)
bar.__exit__ = types.MethodType(__exit__, bar)

for i in bar(range(100)): 
    print('Some text', i) 
    time.sleep(0.1) 
    if i==50: 
        raise KeyboardInterrupt 

Is there a disadvantage to this kind of solution I'm not seeing?

I also overrode the __exit__ method here, which means that when the context manager is being used and the iteration is interrupted, the progress bar doesn't incorrectly show a completed state.

wolph commented 5 years ago

I ran with the same problem with stderr and using progressbar.streams.wrap_stderr. I did not know that using the progress bar as a context manager would prevent this problem (so I created a context manager wrapper myself). I could not find this feature in the doc, I think it would be nice to mention it somewhere, don't you think?

It's mentioned in the readme part of the docs: https://progressbar-2.readthedocs.io/en/latest/#context-wrapper And there are a load of examples that use it: https://progressbar-2.readthedocs.io/en/latest/examples.html

I agree that the docs can be improved a lot though. The big problem is that my time is quite limited and this project is not the only project that I maintain: https://pypi.org/user/WoLpH/ If you have suggestions where the docs can be improved, all help is appreciated :)

@WoLpH Thanks for pointing out the context manager solution, In my case, I'm handing around a lot of iterators and I'd like to selectively wrap some in progress bars, so the context manager would be inconvenient to use.

I agree, context managers are often inconvenient to work with and your bar.finish(dirty=True) is an easier solution in most cases.

I think that the iterator version could also clean up after itself, by ensuring that finished() is called when a general exceptions would stop the iterator.

I fail to see how, but I might be missing something here :)

Is there a disadvantage to this kind of solution I'm not seeing?

Your code still uses the bar.finish(...) in the except: part. Without that it still won't work.

I also overrode the __exit__ method here, which means that when the context manager is being used and the iteration is interrupted, the progress bar doesn't incorrectly show a completed state.

That's indeed a bug that needs to be fixed. Good catch!

wolph commented 5 years ago

Nevermind... all that, thanks to your suggestion I found the answer. I can catch GeneratorExit. Since it doesn't inherit Exception I never knew I could catch that from the code.

Thanks for the suggestion, I'll fix it soon :)

ghost commented 5 years ago

I fail to see how, but I might be missing something here :)

For clarity, all I meant by the iterator cleaning up after itself was exactly that it could check for exceptions and ensure finished() is called before quitting. Calling finished() seems to be the best way to ensure that std is unwrapped.

Your code still uses the bar.finish(...) in the except: part. Without that it still won't work.

This is true. I think that this is the right place to put the except block ensuring finished() is always called; inside the next method, rather than around the whole iteration. It means that transparently using the progress bar as an iterator, like in your examples, now works perfectly, even when there's a general exception that stops the iteration.

I'm happy to knock together an PR for these changes, or perhaps do them another way if you have a different preference?

ghost commented 5 years ago

I can catch GeneratorExit

Ah, that's perfect Again, happy to make the PR, otherwise, thanks very much for your help!

wolph commented 5 years ago

I've added the two changes to the development branch, can you test if that does everything you were looking for? :)

I'm always happy with pull requests if you have more suggestions of course :)

wolph commented 5 years ago

@mmcewen-g not sure if you noticed the update so pinging you again :)

If it works ok I'll create a new release

ghost commented 5 years ago

This doesn't work, but it took me a while to work out why. Apologies for going into this pedagogically, I wrote most of this while I was working out for myself what was going on, and hopefully this sill be a good explanation for people who stumble across this in the future.

We're dealing with Generators, Iterators and Iterables here. An Iterable is an object with an __iter__() function, which returns an Iterator. An Iterator is an object with a __next__() function, which returns values and eventually raises StopIteration.

A Generator is weird: In code, they're any function written with yield expressions in them. Calling such a function actually returns a 'generator object' rather than whatever the function looks like it might return. A 'generator object' is a special kind of Iterator; when __next__() is called, the generator runs the function from where it last left off until it hits the next yield and returns the yielded value, or until they return, at which point they raise StopIteration. They also have additional send(), throw() and close() functions. Unlike Iterators, Generators can clean up after themselves using the aforementioned close(). This works by raising a GeneratorExit at the last yield that fired in the generator, which can be caught to do cleanup. close() is called when a generator object is garbage collected, meaning cleanup is guaranteed.

Iterators in general should not 'hold things' (like locks, open files or IO wrappers), because python gives them no way to clean up. If an iterator does need to hold something, then it should really be a generator, which has the additional 'infrastructure' of close() to guarantee things get cleaned up.

Currently, ProgressBar is iterable, meaning it can be used in for...in loops. It is also an Iterator; it has a __next__() method and __iter__() just returns self. It is not a Generator; it has no close() which is what Python needs to raise a GeneratorExit inside it. Our attempt to catch GeneratorExit was doomed from the start.

If we want the progress bar to be able to catch GeneratorExit and clean up, then instead of __iter__ returning self it could return a generator. This generator can catch GeneratorExit and ensure that finish() is called.

For a toy example:

import progressbar 
import time 
import types

class ProgressBarNew(progressbar.bar.ProgressBar):
    def __next__():
        #remove this from ProgressBar entirely
        raise Exception

    def __iter__(self):
        try:
            for value in self._iterable:
                if self.start_time is None:
                    self.start()
                else:
                    self.update(self.value + 1)
                yield value
        except GeneratorExit:
            print("Caught GeneratorExit: Cleaning up")
            self.finish(dirty=True)
            raise

bar = ProgressBarNew(redirect_stdout=True)

try:
    for i in bar(range(100)):
        print('Received', i) 
        time.sleep(0.1)
        if i == 10:
            raise KeyboardInterrupt
except KeyboardInterrupt:
    print("Caught KeyboardInterrupt")

This does work as expected!

Received 0                                                                     
...                                                              
Received 10                                                                    
Caught GeneratorExit: Cleaning up
 10% (10 of 100) |##                     | Elapsed Time: 0:00:01 ETA:   0:00:09
Caught KeyboardInterrupt

bar(range(100)) returns the ProgressBar with a defined self._iterable. The for...in naturally calls iter() on it, and gets back a generator object, which it iterates over. When the iteration is interrupted, this generator object is garbage collected, GeneratorExit is caught inside it and finish() is called. Meanwhile, the ProgressBar object is left alone, and may be inspected by the user in code afterwards that catches the exception.

There is one downside of this, which is that the iteration in the ProgressBar is now split over two objects; the ProgressBar holds the state of the current iteration and the generator references the ProgressBar to update the state during iteration (including cleaning up). This has the advantage that the state of the progress bar is easy to reference inside the iteration loop, but I expect that's not the typical use case for then the progress bar is simply used to wrap an iterable in a for...in statement.

It might be worth discussing moving the state for the progress bar into the generator object, and leaving ProgressBar as a traditional iterable that returns logically independent generators. This kind of refactoring would also have the advantage that the ProgressBar could be made reusable without needing any calls to init(), as it would simply return a new generator when it was reused.

Neraste commented 5 years ago

Thank you for writing this pedagogically!

ghost commented 4 years ago

I realized that we could do a less invasive job by promoting the ProgressBar itself to being a Generator. This is a much smaller change, as a Generator is just a special type of Iterator, which the ProgressBar is already. It also means the current examples and code usage can stay identical, and that we aren't introducing any new objects.

It took me a little while to figure out how to arrange for python to run close on the ProgressBar - Python uses __del__ when the object is garbage collected, and generator objects have a call to close in their __del__. The abstract class for Generators doesn't include a __del__, so it has to be included manually.

With that, whenever the progress bar is garbage collected, it makes a direct call to close and that runs finish, which guarantees that stdout is unwrapped no matter what stops the ProgressBar. Now the only way stdout can remain wrapped is if the user is holding a reference to the ProgressBar that owns the wrapper, in which case they're able to call finish themselves if they want to.

Neraste commented 4 years ago

I realized that we could do a less invasive job by promoting the ProgressBar itself to being a Generator. This is a much smaller change, as a Generator is just a special type of Iterator, which the ProgressBar is already. It also means the current examples and code usage can stay identical, and that we aren't introducing any new objects.

This is what I eventually did in my project using ProgressBar:

def progress_bar(iterator, *args, **kwargs):
    with progressbar.ProgressBar(*args, **kwargs) as progress:
        for item in progress(iterator):
            yield item

Which is very close to what progressbar.shortcuts.progressbar does.

ghost commented 4 years ago

The above (and the shortcut) both do something similar but not identical to the solution in the PR. When python runs the generator function, it returns a new generator object (which is a Generator capital G), meaning that there are now two objects in memory: the ProgressBar and the generator object python made when the generator function was called.

def progress_bar(iterator, *args, **kwargs):
   with progressbar.ProgressBar(*args, **kwargs) as progress:
       for item in progress(iterator):
           yield item

bar = progress_bar(range(100))
type(bar) # generator object, not ProgressBar

This is different to the solution in the linked PR, which is to make the ProgressBar itself into a generator object; now there is only one object in memory. This doesn't make a huge difference, but it is a little more elegant and makes the stack trace cleaner if the iteration is interrupted.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.