networkx / nx-guides

Examples and Jupyter Notebooks about NetworkX
https://networkx.org/nx-guides/
Creative Commons Zero v1.0 Universal
184 stars 105 forks source link

Linting and formatting notebooks #45

Open rossbar opened 2 years ago

rossbar commented 2 years ago

There have been some previous discussions on how best to handle tasks like linting and formatting the notebooks. For example - we'd like to have the code cells formatted (e.g. via black) and have the markdown cells respect line character limits. Ultimately, I think we'd like to be able to incorporate automated linting/formatting to the workflow so that authors don't have to think about it at all. This issue is for starting the discussion on how best to do that.

These topics have been discussed in other communities as well, most notably in mwouts/jupytext#432. There are some really nice ideas there. Fortunately, it looks like jupytext has built-in support for applying black to code cells. In our case (with .md-formatted notebooks) this would look like:

jupytext notebook.md --pipe black

I've tried this out a bit and it seems pretty robust - I think we could incorporate this into the workflow (maybe via pre-commit + a "style" CI job, similar to how NetworkX does it) relatively easily.

Formatting the non-code-cell markdown is a bit trickier. There is a suggestion from that same jupytext issue (https://github.com/mwouts/jupytext/issues/432#issuecomment-598856041) to try pandoc which has support for softbreaks (i.e. line breaks in source files) and automatically applies this formatting:

jupytext notebook.md --pipe-fmt ipynb --pipe 'pandoc --from ipynb --to ipynb --atx-headers'

This kind of works, but doesn't inherently support/recognize all the features of MyST markdown. For example, when I tried this on one of our notebooks, it dropped the footnotes. This seems like a reasonable approach, but it's not suitable (in the current state) for incorporation into an automated workflow.

rossbar commented 2 years ago

Some followup on how best to incorporate this into the workflow...

As mentioned above, I think the code formatting (i.e. black) step is robust enough add, so I'd focus only on that to start. My initial idea would be to add a "linting" check to the CI similar to what is done in NetworkX itself.

As a first step, a CI job that fails when things aren't formatted properly is fine, as maintainers can always tell/help contributors lint their code (run jupytext *.md --pipe black). It would also be nice to get things working via pre-commit since that will automate everything and eliminate the need to have maintainers explain/run the linting procedure every time a job fails. Getting the pre-commit hooks set up might be a little more involved, so starting with the linting CI job seems totally reasonable and probably what I would do.

@harshal-dupare who has expressed interest in the infrastructure :tada: . Feel free to comment/ask questions in this thread or open new issues as you see fit!