nteract / scrapbook

A library for recording and reading data in notebooks.
https://nteract-scrapbook.readthedocs.io
BSD 3-Clause "New" or "Revised" License
281 stars 26 forks source link

Authorize filter arguments to filter notebook to read given their name #81

Closed tanguycdls closed 3 years ago

tanguycdls commented 3 years ago

Hello, thanks a lot for the package ! I have a small suggestion to improve the user experience.

When reading multiple notebooks from a folder it could be nice to be able to filter which notebook you want to open. In my use case our ipynb files are quite large (+1Mb) and so reading a folder with a large amount of notebooks can be slow.

def read_notebooks(path, filter_notebooks=None):
    """
    Returns a Scrapbook including the notebooks read from the
    directory specified by `path`.

    Parameters
    ----------
    path : str
        Path to directory containing notebook `.ipynb` files.
    filter_notebooks: function
       Functions used by filter to filter out notebooks by their name

    Returns
    -------
    scrapbook : object
        A `Scrapbook` object.

    """
    scrapbook = Scrapbook()
    for notebook_path in sorted(filter(filter_notebooks, list_notebook_files(path))):
        fn = os.path.splitext(os.path.basename(notebook_path))[0]
        scrapbook[fn] = read_notebook(notebook_path)
    return scrapbook

My proposal would be a simple filter here in the code. I can open a pr if you want.

If you have a better idea to load our notebooks faster I would be happy to hear any suggestion, but given nformat api i'm not sure we can load scrapbook metadata faster ?

MSeal commented 3 years ago

Sorry I missed this issue @tanguycdls (was coming off paternity leave).

This seems like a good addition. I'd suggest renaming the argument to path_filter but I'd be happy to merge such a PR if you want to make the addition.