scikit-hep / hepconvert

BSD 3-Clause "New" or "Revised" License
8 stars 1 forks source link

feature request: Progress Bar option #73

Closed NJManganelli closed 4 months ago

NJManganelli commented 4 months ago

Some operations can become extremely long when converting between file formats or hadd'ing (usually too many) histograms, progress bars can be great to have around.

rich is part of pip for some time, and has options for sub-progress bars where appropriate. https://rich.readthedocs.io/en/stable/progress.html#advanced-usage

image

jpivarski commented 4 months ago

In scientific contexts, I more often see tqdm used as the default progress bar. I had hoped that it would have the same interface as rich.progress.Progress, but it doesn't. It does seem to be the more popular of the two. Oh wait—there's a backend to use rich progress bars with a tqdm interface, so it's not an exclusive choice.

tqdm's interface comes in two steps:

  1. construct with progress_bar = tqdm.tqdm(total=number_of_items) or reassign an already-constructed object with the number of items using progress_bar.reset(total=number_of_items).
  2. update with progress_bar.update() "number of items" times.

It has lots of bells and whistles, but rather than make hepconvert choose what you hope is the user's favorite display style, hepconvert functions could take a progress_bar argument and (1) immediately reset it to tell it the number of items and (2) call update in hepconvert's loop.

That way, users can set up the progress bar however they want and you just control its filling. Also, hepconvert wouldn't need to depend on either rich or tqdm (though it might, for other reasons).

Here's what a minimal use would look like to a user:

hepconvert.merge_root("out.root", ["in1.root", "in2.root"], progress_bar=tqdm.tqdm())

and the implementation inside hepconvert would look like:

def merge_root(output, inputs, progress_bar=None):
    number_of_items = ...  # calculate this from the number of inputs somehow

    if progress_bar:   # not None and not False
        if progress_bar is True:  # progress_bar=True should provide a default
            import tqdm           # which strictly depends on the tqdm library
            progress_bar = tqdm.tqdm()
        progress_bar.reset(number_of_items)

    for item in data:
        ...
        if progress_bar:
            progress_bar.update()

See uproot/src/uproot/extras.py for ways of making the "failure due to missing optional dependency" more helpful, by providing copy-pasteable command lines to install the missing package.

zbilodea commented 4 months ago

Added a progress bar option for standard tqdm with the option for users to use a custom bar object (might add an option for tqdm.rich soon if that's also popular), thanks for the recommendation!