ploomber / ploomber-engine

A toolbox 🧰 for Jupyter notebooks 📙: testing, experiment tracking, debugging, profiling, and more!
https://engine.ploomber.io
BSD 3-Clause "New" or "Revised" License
59 stars 14 forks source link

Added support for progress bars #66

Closed mehtamohit013 closed 1 year ago

mehtamohit013 commented 1 year ago

Describe your changes

Added support for flushing out stdout and stderr

Issue ticket number and link

Closes #61 #1080

Checklist before requesting a review


:books: Documentation preview :books:: https://ploomber-engine--66.org.readthedocs.build/en/66/

mehtamohit013 commented 1 year ago

Code used to run:

print("Hello World")
for i in range(5):
    print(i)
from tqdm import tqdm
import time

my_list = list(range(100))

with tqdm(total=len(my_list)) as pbar:
    for x in my_list:
       time.sleep(0.1)
       pbar.update(1)
       if x%20==0:
           print(x)
print("Hello World")
print('Hello World 2')

Output Screenshot_2023-03-17_11-27-26

Issues:

Any suggestions on how to resolve this issue? Maybe we can add some type of cell tag when tqdm is running and resolve it separately from stderr one

idomic commented 1 year ago

Any suggestions on how to resolve this issue?

Why don't you just output it as is? You can always parse and wait for those characters but I assume they've implemented a mechanism for that already in tqdm? Also let's keep the discussion in one place (probably here is better).

edublancas commented 1 year ago

I think for this PR, let's just make the output of ploomber-engine be as close as possible to papermill.

Based on the discussion, it sounds like we just need to add 1) cell separators 2) Displaying standard error 2) flushing standard output/error

Once that's in, we can tackle updating the progress bar inline.

mehtamohit013 commented 1 year ago

Based on the discussion, it sounds like we just need to add 1) cell separators 2) Displaying standard error 2) flushing standard output/error

  1. What do you mean by cell seperators?
  2. stderr is currently displayed
  3. Flushing messes up the output. So, I have not used flushing
idomic commented 1 year ago

@mehtamohit013 check if you can ask for review now? I've updated the permissions

edublancas commented 1 year ago

cell separators: https://user-images.githubusercontent.com/43580047/225945963-93552b85-83ef-48ae-9993-fd9acf5ba1b6.png

Executing cell X Ending cell X

ok so if stderr is displayed and we cannot flush output at the moment, I think what's missing is to fix the CI, right?

idomic commented 1 year ago
  1. Flushing messes up the output. So, I have not used flushing

So how are we going to show the output as it comes? I think that's essentially the main difference we have to solve it.

mehtamohit013 commented 1 year ago

Why don't you just output it as is? You can always parse and wait for those characters but I assume they've implemented a mechanism for that already in tqdm?

The problem lies with the fact that I cannot make a parser just only for tqdm as there may be other messages pushed to stderr. Currently, I am just simply redirecting the notebook stderr and stdout to the original stderr and stdout

ok so if stderr is displayed and we cannot flush output at the moment, I think what's missing is to fix the CI, right?

No, tqdm messes up with the output. You can see in the image attached above Code used: self.default.write(s) where self.default is the original stdout and stderr, before it is switched with IO() class

So how are we going to show the output as it comes? I think that's essentially the main difference we have to solve it.

I think it is because of yield. I will look into flushing more

def patch_sys_std_out_err(display_output):
    """Path sys.{stout, sterr} to capture output"""
    # keep a reference to the system ones
    stdout, stderr = sys.stdout, sys.stderr

    # patch them
    stdout_stream = IO(default = stdout,std_type='out', display = display_output)
    stderr_stream = IO(default = stderr, std_type='err')
    sys.stdout, sys.stderr = stdout_stream, stderr_stream

    try:
        yield stdout_stream, stderr_stream
    finally:
        # revert
        sys.stdout, sys.stderr = stdout, stderr
class IO(StringIO):
    def __init__(self, default,std_type,display = True, newline="\n", initial_value="" ):
        super().__init__(initial_value=initial_value, newline=newline)
        self.default = default
        self.std_type = std_type
        self.display = display
        self._values = []

    def write(self, s):
        self._values.append(s)
        if self.display:
            self.default.write(s)
        super().write(s)

    def get_separated_values(self):
        return self._values[:]
edublancas commented 1 year ago

ok, so then the only "easy" thing we can do at the moment are the cell separators right? I'd suggest opening a PR for that, we merge, make a release. and then we investigate this flushing issue

mehtamohit013 commented 1 year ago

ok, so then the only "easy" thing we can do at the moment are the cell separators right? I'd suggest opening a PR for that, we merge, make a release. and then we investigate this flushing issue

Yeah sure

Btw this is the output with flushing (Observe the '0', which messes with ploomber progress bar, because ploomber progress bar updates after cell is executed) Screenshot_2023-03-18_11-09-21

idomic commented 1 year ago

I think it is because of yield. I will look into flushing more

Yeah I think you're right, I also assume it works fine: tqdm + papermill?

mehtamohit013 commented 1 year ago

Yeah I think you're right, I also assume it works fine: tqdm + papermill?

No, it doesn't 😅

Ours is messy but tqdm progress bar is updated inplace and we are registering \r also. The robust solution is to make a custom wrapper with tqdm

Below is the output with papermill-2.4.0: Screenshot_2023-03-18_11-19-45

idomic commented 1 year ago

😂😂

I guess it's a nice to have, we should open an issue just for this since it's probably more than a quick fix!

On Sat, Mar 18, 2023, 11:24 Mohit Mehta @.***> wrote:

Yeah I think you're right, I also assume it works fine: tqdm + papermill?

No, it doesn't 😅

Ours is messy but tqdm progress bar is updated inplace and we are registering \r also. The robust solution is to make a custom wrapper with tqdm

Below is the output with papermill: [image: Screenshot_2023-03-18_11-19-45] https://user-images.githubusercontent.com/43580047/226114862-00b101a2-7108-4a19-a434-868a50add0e0.png

— Reply to this email directly, view it on GitHub https://github.com/ploomber/ploomber-engine/pull/66#issuecomment-1474877892, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYPJONQ76Z2PSL5TPHTR7TW4XHTRANCNFSM6AAAAAAV7C4ZNU . You are receiving this because you commented.Message ID: @.***>

mehtamohit013 commented 1 year ago

@edublancas @idomic If you want a quick fix, how about this: Just some extra \n but the progress bar is working Screenshot_2023-03-18_11-34-42

idomic commented 1 year ago

I think it's a good temporary solution (might be a bit confusing for the users)

On Sat, Mar 18, 2023, 11:36 Mohit Mehta @.***> wrote:

@edublancas https://github.com/edublancas @idomic https://github.com/idomic If you want a quick fix, how about this: Just some extra \n but the progress bar is working [image: Screenshot_2023-03-18_11-34-42] https://user-images.githubusercontent.com/43580047/226115554-8881b4eb-061d-46ea-be46-bc76ede3861e.png

— Reply to this email directly, view it on GitHub https://github.com/ploomber/ploomber-engine/pull/66#issuecomment-1474880436, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYPJONHOKDS627HF4MYSXDW4XJAVANCNFSM6AAAAAAV7C4ZNU . You are receiving this because you were mentioned.Message ID: @.***>

mehtamohit013 commented 1 year ago

@edublancas @idomic If you want a quick fix, how about this: Just some extra \n but the progress bar is working Screenshot_2023-03-18_11-34-42

Implemented this one

idomic commented 1 year ago

It doesn't closes https://github.com/ploomber/ploomber/issues/1080 since we're still using papermill there.