Open steremma opened 6 years ago
Thanks for posting @steremma :+1: Guys @yurkai @anotherbugmaster @CLearERR let's discuss current point!
Some thoughts:
About 2: we almost ignore stack-overflow, this doesn't look like a good idea (because this is very popular QA service and significantly most popular than our mailing lists)
About 3, 4 (general view): we didn't split it, we have many notebooks that contain:
It's all in a heap and it's very difficult to find what you need. We have no any kind of "index".
About "testing" that proposed by @steremma - I'm already trying these solutions, this doesn't work, because
Also, notebooks produce many problems like
For this reason, @anotherbugmaster and I propose https://github.com/sphinx-gallery/sphinx-gallery approach for "tutorials" and "how to" guides (instead of notebooks). But for the case, when our notebook demonstrates large (in the meaning of size of data/running time/consuming memory) end2end example - we can create a new repo (for notebooks only) and move it to this repository.
Also, "for free" we receive nice features:
.rst
and .ipynb
that completely desync, because contributors modify only .ipynb
)About Disadvantages of this approach that was mention by @steremma:
Requires new dependencies that have to do with plotting because this is mostly a plotting framework.
This is not a problem, because of current dependency for documentation only (not "core" dependency).
Will take more time to implement since I need to get familiar with it.
This isn't hard, really, also @anotherbugmaster make several examples, how to use this in #1809
It is not entirely obvious to me how we will be testing it.
It's very simple: when you build documentation - you run all of these "gallery" examples (this is one CLI option).
I agree with the points raised, especially the difficulties with keeping track of notebooks as I have experienced the same issues in my workplace. The linked PR will be a useful reference for working with sphinx gallery
. I will start looking into it as soon as I complete my previously assigned tasks (probably in 1 week from now)
Documentation in general
In general, a well written and maintained documentation can be divided into 4 concrete elements as explained in this talk:
1 in our case is achieved with the docstrings and the Sphinx building the html content from them. We already have a strong base reference and multiple people are working on increasing coverage (myself included).
Regarding 2, we already have gitter, mailing lists and issue specific discussions in the GitHub issues and Pull Requests pages.
We have some overlap between 3 and 4 since we use jupyter notebooks for both but it is not very clear which notebook is a tutorial or a guide. For example I would say the
sklearn_api
notebooks are tutorials because they only show the basics (holding the user by the hand), while the model specific notebooks likeword2vec IMDB/Wikipedia
are more like guides because they solve a very specific problem. Perhaps we need to split them into two categories (folders) namedtutorials
andguides
.The problem
Its extremely important that the tutorials run always (new releases do not break the notebooks) and everywhere (the tutorial will run on every OS, python version or distribution etc.). This is very hard to guarantee at the moment.
Solution
We need to test the notebooks. By testing I mean make sure all cells run and not raise any error. (Can we also test for exact outputs?) Most google results show bad solutions but these two seem promising (although a bit hacky):
Advantages
Disadvantages
rst
s trivially, we therefore end up dealing with 2 "sources of truth" : the official doc page and the notebooks themselves.Alternative
As we discussed with @menshikh-iv, maybe we should migrate from notebooks to Sphinx gallery. Using this approach our tutorials are Python scripts.
Advantages
rst
s directly. This ensures that there is a single source of truth and that new notebooks can be trivially added to the online documentation.Disadvantages
Extra thoughts
Could we also add visualizations in tutorials? This is easy using both alternatives but I am not sure if we can come up with meaningful graphs.