worldbank / template

🎩 Project Template
https://worldbank.github.io/template/
Mozilla Public License 2.0
21 stars 11 forks source link

Enable Jupyter Book trigger notebook execution functionality #52

Open g4brielvs opened 3 weeks ago

g4brielvs commented 3 weeks ago

In light of a recent use case where automated notebook execution could significantly streamline workflows, we propose enabling the functionality to trigger notebook executions selectively. Specifically, we suggest implementing this feature for notebooks located in the reports/ directory while ensuring that notebooks in the notebooks/ directory remain unaffected (execution set to off).

g4brielvs commented 3 weeks ago

@andresfchamorro @bennyistanto @elbeejay @SahitiSarva @avsolatorio @Holly-Transport If you could take a moment to review and share your thoughts, it would be much appreciated. Feel free to comment directly on the issue thread with your insights.

elbeejay commented 3 weeks ago

It seems like a good idea to me @g4brielvs, I believe documentation is better off when the notebooks are executed, and I think it's important that empty notebooks are committed to the repository. Committing executed notebooks can lead to large "false" changes in PRs due to changes in notebook metadata, as well as merge conflicts, such as in this PR in the GOSTUrban project: https://github.com/worldbank/GOSTurban/pull/32.

In the GOSTUrban project, I raised the issue of using a pre-commit hook to automatically clear notebooks of both their executed content and metadata (https://github.com/worldbank/GOSTurban/issues/24) and have the pre-commit hook, nbstripout, set up in this PR: https://github.com/worldbank/GOSTurban/pull/21

I think it'd be good to make the configuration changes proposed here to execute the notebooks in the "reports" subdirectory via CI when the book is built and add the pre-commit hook nbstripout to the template (involves adding the dependency to the pyproject.toml file and modifying the .pre-commit-config.yaml file to use the hook).

bennyistanto commented 3 weeks ago

For me as a viewer, always nice to see the notebook collection from DDP showcase, I am able to play around.

But that's not always worked/useful in my case (notebook related to the economic monitor analysis), from Syria, Turkey, Lebanon, Morocco and Myanmar agriculture monitoring. As most of the process are done locally and required lot of data transfer.

Maybe we can have two options, executed by default (DDP showcase) and not executed for a notebook which the objective is for documentation purpose and not for a showcase (GOST PublicGoods).

andresfchamorro commented 2 weeks ago

I agree with the suggestion. This has always been the intention as it makes the work truly reproducible, but there is a bit more complexity managing dependencies to ensure the notebooks run smoothly. I would say let's add some guidelines to readme Dependencies section explaining how to list packages for executing notebooks etc.

Also agree with Benny that we want to keep the option for notebooks to not be executed. Can think of many cases (some notebooks end up being more experimental or require external platforms/data).