Closed idomic closed 1 year ago
I ran two sample notebooks (sample-notebooks.zip) to understand the issue a bit more, some comments:
I was unable to reproduce the issue, all print statements are displayed when passing --log-output
; ploomber-engine only displays whatever is sent to stdout
so my guess is that papermill is also displaying stderr
and/or the text results from each cell - we should run a more detailed analysis and then ensure that both produce the same output. Another thing we can add is the cell delimiter (papermill prints: Executing Cell X -----
)
ploomber-engine print.ipynb /dev/null --log-output
I could not reproduce this by creating notebook that displays a progressbar using tqdm and executing it with --log-output
, so we need to investigate more:
ploomber-engine progress.ipynb /dev/null --log-output
Yeah the delimiter can be a good option, it prints all together. To recreate you can run the posthob.ipynb.
Hi @idomic Can you please provide me with posthob.ipynb or where it is located? I can't seem to find it.
Hi, So, I am running this code:
print(1+2)
print(3+4)
print(1+7)
from tqdm.auto import tqdm
import time
my_list = list(range(100))
with tqdm(total=len(my_list)) as pbar:
for x in my_list:
time.sleep(0.01)
pbar.update(1)
if x%20==0:
print(x)
print(1)
Running with papermill:
Running with ploomber engine on CLI give me output:
Observations:
Commands used
ploomber-engine rough.ipynb output.ipynb --log-output
papermill rough.ipynb output.ipynb --log-output
PS: I am not able to run the notebook @idomic mentioned Edit: Updated Images and fix spellings
@mehtamohit013 A few thoughts:
Also, I can't find documentation of ploomber-engine CLI command
I've opened an issue about it last week I think
Progress bar of cell 5 is not displayed
I think if the --log-output
is there we need to research why, sounds like a bug.
Also ploomber-engine execution time which is around 5-8sec is not consistent and it is slower than papermill 3-4sec
It runs on a different process, that's why the difference, but try profiling it, see what's causing this delay.
Let's connect on the notebook I'll help you run it!
I think the missing output might be that the tqdm progress bar is printed to standard error and we're just displaying standard output. If that's the case, we should ensure we also display standard error in the console.
You can check this with:
import sys
print("printing to stderr", file=sys.stderr)
and see if ploomber-engine displays it
Some clarification regarding performance
time
command of zsh, and I am getting time in the range of 1.15 - 1.22 sec, while papermill is in the range 1.65-1.70 secJust a minor observation: We cannot pass the file name to which data should be saved in --save-profiling-data
. It creates output-profiling-data.csv
by default
Just a minor observation: We cannot pass the file name to which data should be saved in --save-profiling-data. It creates output-profiling-data.csv by default
Please open an issue about it, I think there should be an option to pass an argument.
The 5-8 sec that I mentioned above is the time, the zsh shell is taking to generate a new command for me to input. So maybe it should include the delay in stdout displaying to the shell.
Seems like it's faster than papermill, but the output is slower, but we still need to figure out why and how to fix it.
I think the missing output might be that the tqdm progress bar is printed to standard error and we're just displaying standard output. If that's the case, we should ensure we also display standard error in the console.
Hi @edublancas , Currently, ploomber engine prints the output from stdout only when the cell is completely executed, however, this is not ideal as the output should be printed to the console as soon as it is printed to notebook stdout
I have mentioned more details in PR #66
When running the same notebook with
--log-output
papermill
shows all of the outputs andploomber-engine
doesn't.This happens on our posthog reporting notebook. For instance Cell 26, shows the output in papermill:
But Doesn't in ploomber-engine:
Another thing I noticed is when the notebook runs, there's a dual progress bar within the cell the messes with the main bar, that might be confusing for users. (in
ploomber-engine
)