vizzTools / cookiecutter-science

The template to start a new science project in Vizzuality.
MIT License
1 stars 0 forks source link

Promote good practices with notebooks and include notebook templates #6

Closed ikerey closed 1 year ago

ikerey commented 1 year ago

I would like to propose the promotion of good practices with notebooks and the inclusion of jupytemplate in the cookiecutter for better notebook organization and readability.

Notebooks play a crucial role in data science projects, and maintaining a clear structure can greatly improve their readability and maintainability. Jupytemplate is a tool that provides a template system for Jupyter notebooks, enabling the creation of consistent notebook structures. It allows users to define a template notebook with pre-defined sections, headers, and explanatory text, guiding users to follow a standardized structure.

I suggest considering two options for implementing this:

  1. Including jupytemplate as part of the cookiecutter: By integrating jupytemplate into the cookiecutter, users would have it readily available when creating new notebooks from Jupyter Lab. This would encourage the adoption of good practices from the beginning of the project.
  2. Providing clear instructions and recommendations: Alternatively, we can provide instructions in the README.md file of the cookiecutter template, suggesting the use of jupytemplate and providing a link to its template, or directly adding this template on the notebook folder. This approach would ensure that users are aware of the benefits of using such a template and can easily utilize it in their notebooks.

Having jupytemplate integrated or recommended within the cookiecutter template would provide a consistent structure for notebooks, enhance readability, and help enforce best practices across the team.

BielStela commented 1 year ago

jupytemplate looks outdated and not for jupyter lab. I found this https://github.com/finos/jupyterlab_templates which is the same concept but adapted for jupyterlab. I like the idea of having a template but I don't like the implementation of it: adding more files and configs to the generated project. Maybe having a simple notebooks/template/template.ipynb with a minimal set of sections and guidance will do the job but I find it a bit specific and hard to find a template structure that fits all the notebook use cases.

On the other hand, with the new auto-format tools like isort, which puts the imports at the top, the shape of the notebooks will be slightly more organized and easier to follow than before.

Does any of you have an example of template that could be used?

ikerey commented 1 year ago

Yes, this one 😅.

BielStela commented 1 year ago

hmm is something like this applicable to most of the cases? I think that too often we will not need half of the sections in the template or the structure of the structure will not be suitable for the purpose of the notebook. If we force to create all the notebooks with a template, then all the notebooks will have default empty/non used sections like

Parameter definition

We set all relevant parameters for our notebook. By convention, parameters are uppercase, while all the other variables follow Python's guidelines.

Data import

[...]

Which I think is counter productive because it adds noise and can be confusing.

On the other hand if we just include an optional template it is up to the user to make use of it and it will not be used that much and will add a yet another file around the project that nobody knows why is there...

Note that I'm being catastrophic and I'm speaking for myself putting myself as being full of bad habits but good proxy of general behavior

What I think will work best is to include some links in the science wiki / way to work / culture ( or even in the template README) to some good notebooks as a reference for the people to follow and be sure that everybody is aligned with it. For example the pytudes of Peter Norvig are perfect example of awesome notebooks:

They all follow more or less the same pattern of Title -> overview -> imports -> problem breakdown w code but the structure is not closed at all.

ikerey commented 1 year ago

Yes, I completely agree with your points. Leaving empty or unused sections in a notebook template would not make any sense and may cause confusion. It is important to have flexibility in the notebook structure to accommodate different cases.

The sections in the template should be viewed as general guidelines, not strict rules. Each user can modify, remove, or add sections according to their needs to maintain a clear and logical structure.

From my experience, many of our notebooks contain only code and are difficult to follow. I usually try to use this structure, and although I don't use all sections, it helps me find things more easily. Therefore, I believe that having a suggested structure like the template can be very helpful in making notebooks more readable.

BielStela commented 1 year ago

Also is worth bringing into the discussion that what we do- writing notebooks- is a practice older that what we might think! This paradigm was invented by Donald Knuth ( yes, also the inventor of TeX) in the 80s and called literate programming (don't know if it is relevant to the conversation but quite interesting :sweat_smile: ).

For me, anything that will make our work get cleaner and better ( closer to feeling like a pytudes notebook :heart_eyes: ) is worth the effort. I'll put together a simple template so we can discuss and iterate over it and how to include it in the project workflow.

ikerey commented 1 year ago

Thanks, that will be great.