scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
830 stars 85 forks source link

Strategy for documentation #158

Closed jpivarski closed 4 years ago

jpivarski commented 4 years ago

High-quality documentation is a serious need. Uproot has one big page of tutorial (which doubles as a Binder-based interactive notebook), readthedocs for references, and StackOverflow for example-driven questions. Awkward 0 only has a big page tutorial.

Some observations:

Thoughts? Suggestions?

Let me know below. With documentation, it's hard to back up and use something different once some serious writing has begun because every toolchain has consequences on how the text is formatted. I want to get this right the first time.

jpivarski commented 4 years ago

@henryiii's class as a JupyterBook: https://henryiii.github.io/compclass-book/week1/0_IntroductionAndLogin

henryiii commented 4 years ago

See my comment here: https://github.com/scikit-hep/scikit-hep-tutorials/issues/1.

For a single package, the nbsphinx extensions works well for examples (See boost-histogram's examples). For cross-package tutorials, I think JupyterBook would be the best option.

henryiii commented 4 years ago

Note: GitBook is actually the name of an abandoned Open Source software package that I used to make Modern CMake, the CLI11 tutorials, and a few more, and has been replaced by a service. We've been having to migrate away from GitBook; for example the LHCb starter kit was converted to a different technology a couple of months ago.

jpivarski commented 4 years ago

I didn't want to get into the details in the meeting, but I think cross-package tutorials and JupyterBook-based Awkward-only tutorials are a good idea. For the time being, I'm trying to figure out a strategy for the Awkward-only tutorials. I'll be doing a big documentation push pretty soon, and I want to go in the right direction.

Doxygen will be involved, definitely for C++ and probably for Python. (That doxypypy looks pretty good. I don't want to have to remember to add autodoc stubs in reST files; I've forgotten it too many times in Uproot docs.)

The main thing will be nugget-sized "how tos" and maybe one "getting started" tutorial. I'd kinda like the how-tos, tutorials, and reference docs (Doxygen output) to be together—maybe in the same format, same site, and/or same toolchain.

I'll take a look at JupyterBooks.

jpivarski commented 4 years ago

Well, JupyterBooks isn't popular. As long as it isn't in danger of going away the way StackOverflow Documentation did, that's the important point.

But supposing the community loses interest in JupyterBook, then we'd be left with documentation in a bunch of Jupyter notebooks, which isn't a major problem. There will be other ways to automatically convert it as documentation. (Contrast this to all the Google Code wiki markup I wrote to document old projects. It's kinda readable as Markdown...)

I know for a fact, though, that nobody reads the Binder tutorial on Uproot. The fact that nobody complained when it was broken for months is a strong indication. Binder isn't the solution: it takes too long to load.

jpivarski commented 4 years ago

Thebelab is nice. It takes 20 seconds to start a kernel for interactivity, which is longer than casual readers will wait, but it doesn't take you away from the page the way a Binder link would.

Maybe the following would work:

henryiii commented 4 years ago

JupyterBook is fairly new. And it is basically vanilla, except for optional cell customizations, such as the ability to hide a cell (like a setup cell) in the HTML output; I don't think you'd need those and they wouldn't have a strong effect anyway, and they are just normal cell metadata.

The good thing about JupyterBook, besides the fact you can use the Jupyter Notebooks directly, even directly view them on Github, is that it is not the only Notebook to static site system. nbsphinx for Sphinx does the same thing, so if JupyterBook goes under, we could switch to Sphinx fairly quickly. Some other static site generators are gaining support for notebooks too. It's just currently the nicest and is now an official part of the Jupyter organization. (I think this happened recently!)

I think the outline you've described above is very much what I was thinking, though I'd use GitHub Actions, as it's a little simpler and you don't need the release mechanism. I also would avoid checking in the generated html into the source (which I currently am doing for the demo site, but I think I can work around that) - it would be force pushed to a gh-pages branch or similar. I'll be playing with this soon(ish) for a tutorial setup and will update you when I have it working.

jpivarski commented 4 years ago

I'd do it in GitHub Actions if I could move all CI/CD to GitHub Actions. I don't know that I need to do that migration first, though.

Reading about this has introduced me to Jupytext, which is great! The plain files that Jupytext uses are more version control-friendly and I wouldn't have to launch a JupyterLab every time I want to edit something (which is a bottleneck for me).

nbsphinx (which I hadn't known about before, either) starts from ipynb notebooks, but a Jupytext → nbsphinx → readthedocs can be a fallback plan if JupyterBook disappears.

I don't want to save HTML output in GitHub either. At least not the main scikit-hep/awkward-1.0 repository. Maybe the documentation build process can send whatever Jekyll/GitHub Pages needs to a repo that's not meant to be edited directly.

The JupyterBook documentation suggests Netlify, which could be a good repository for static HTML, if that's what the documentation build process produces. I used to use AWS buckets for that: static HTML hosting is free. But as much as possible, I'd like this to be unconnected from personal accounts (such as my personal Azure account) and on shared accounts (like the scikit-hep organization). Maybe Zeit. They look friendly.

henryiii commented 4 years ago

I've been using Azure for wheels and GHA for docs and tests, and that's been working well. They all run on pretty much the same Microsoft backends and are the configs are mostly the same except for a 1:1 term mapping. I don't think there's harm in selecting the one that best for a particular job. It's only been in the last year or so that we could run on a single system; it used to be a different system per supported OS at least. Note there is no setup at all for GHA, since it's built in and always available. You already have an actions tab across the top of all your repositories. It just runs based on the presence of files .github/workflows/*.yml.

Jupytext sounded very promising, yes, I just heard about it recently. Launching JupyterLab is a pain.

IRIS-HEP has a dual repo setup for hosting, that does work.

I haven't looked into Netlify.

henryiii commented 4 years ago

Note for related projects (vector, boost-histogram, etc): This strategy is probably a bit different, since Awkward is using Doxygen instead of Sphinx. If Sphinx is already being used, then nbsphinx is probably the correct choice.

jpivarski commented 4 years ago

From @kratsg: look into "breathe" and "exhale". It turns Doxygen into Sphinx.

lukasheinrich commented 4 years ago

also see @cranmer's jupyterbook

https://cranmer.github.io/madminer-tutorial/intro

henryiii commented 4 years ago

Also mine: https://henryiii.github.io/compclass I've built GHA to handle running the notebooks and producing the JupyterBook, and no output is saved in the repo. Repo here: https://github.com/henryiii/compclass

jpivarski commented 4 years ago

This is pretty much figured out:

Of these, the first two are done.