zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.45k stars 273 forks source link

Consider moving the tutorial to a Jupyter notebook #514

Open jrbourbeau opened 4 years ago

jrbourbeau commented 4 years ago

Would others be interested in having the content in tutorial.rst moved to a Jupyter notebook? We could then use the nbsphinx Sphinx extension (https://nbsphinx.readthedocs.io) to run and render the notebook in the docs.

The main advantage I see to the .rst -> .ipynb move would be to easily run the tutorial interactively on Binder (we could include a binder link at the top of the tutorial). The main drawback that comes to mind is editing the tutorial would now be different than editing all the other *.rst files we currently have in the docs (and adds Jupyter as a dependency to build the docs).

Happy to open a PR for this, just wanted to see if there was any interest first

jrbourbeau commented 4 years ago

For reference, here's an example of a rendered Jupyter notebook in a sphinx docs page with a binder link https://examples.dask.org/dataframe.html

alimanfoo commented 4 years ago

FWIW I think this is a cool idea. Would it be possible to run the notebook as part of CI so we catch any errors or inconsistencies between docs and code? I'm keen that all docs with code examples are run as doctests or equivalent wherever possible.

On Fri, 15 Nov 2019, 18:30 James Bourbeau, notifications@github.com wrote:

For reference, here's an example of a rendered Jupyter notebook in a sphinx docs page with a binder link https://examples.dask.org/dataframe.html

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/zarr-python/issues/514?email_source=notifications&email_token=AAFLYQS6E4LDXOZINIVSFZ3QT3TFNA5CNFSM4JN6TQ42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEGKEHI#issuecomment-554476061, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFLYQXL2JYPPDWMN57BJK3QT3TFNANCNFSM4JN6TQ4Q .

jrbourbeau commented 4 years ago

The notebook would be run as part of the documentation build process, so the tutorial should automatically remain up to date with the latest code changes. That said, adding a docs build to the CI (xref #369) would be a nice complementary addition

alimanfoo commented 4 years ago

Adding a docs build to the CI would be very nice, there have been times when the RTFD build was broken and we didn't notice for a while.

On Fri, 15 Nov 2019 at 20:24, James Bourbeau notifications@github.com wrote:

The notebook would be run as part of the documentation build process, so the tutorial should automatically remain up to date with the latest code changes. That said, adding a docs build to the CI (xref #369 https://github.com/zarr-developers/zarr-python/issues/369) would be a nice complementary addition

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/zarr-python/issues/514?email_source=notifications&email_token=AAFLYQUPXE3AEQSCDIC2ASTQT4AQ3A5CNFSM4JN6TQ42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEGTR3Y#issuecomment-554514671, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFLYQXW2H3BLD67PAQQHVTQT4AQ3ANCNFSM4JN6TQ4Q .

--

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health Big Data Institute Li Ka Shing Centre for Health Information and Discovery University of Oxford Old Road Campus Headington Oxford OX3 7LF United Kingdom Phone: +44 (0)1865 743596 or +44 (0)7866 541624 Email: alimanfoo@googlemail.com Web: http://a http://purl.org/net/alimanlimanfoo.github.io/ Twitter: @alimanfoo https://twitter.com/alimanfoo

Please feel free to resend your email and/or contact me by other means if you need an urgent reply.

andrewfulton9 commented 4 years ago

Is this something you all are still interested in implementing? I'd be happy to do it

jrbourbeau commented 4 years ago

Yeah, that would be much appreciated : ) Thanks @andrewfulton9, feel free to ping me if you have any questions

jakirkham commented 4 years ago

IDK if it fits are use case, but it might be worth looking at sphinx-gallery.

andrewfulton9 commented 4 years ago

I have a quick question about the examples in the current tutorial that are skipped over for doctests. It looks like most of them are because they are looking at cloud storage like aws S3 or Azure block storage. Should I make those runnable in the notebook, or skip over them as they are for doctests?

jrbourbeau commented 4 years ago

Good point, thanks for bringing that up @andrewfulton9. We should make sure to include those examples in the notebook, but skip their actual execution. If there's a way we could specify certain code cells shouldn't be executed, that would be ideal. Otherwise, including them as code blocks in a markdown cell would also work. Other suggestions are welcome as always

andrewfulton9 commented 4 years ago

Sounds good. It might be best to put those cells into Markdown if we are going to make the notebooks executable in Binder to not cause confusion. I'll keep exploring options though.

Carreau commented 4 years ago

If you need/want to go 1/2 way, you can use the ipython directive, which will execute and embed the results during the build, while still keeping RST.

It does not have the ability to generate .ipynb though as rst has way more features then markdown, but at least would ensure that docs is up to date.

jupyter-sphinx is similar, but also support widgets and should likely be merged with the ipython-directive at some point.

olusanwo commented 2 years ago

Hi @joshmoore , I am from outreachy, Is this documentation task still available, can I take it?

joshmoore commented 2 years ago

@olusanwo, sure! I'm sure everyone would look forward to a suggestion for this.

sudoyolo commented 2 years ago

Hi @joshmoore Sir, I moved Tutorial.rst to an .ipynb file. Please check it out:

https://colab.research.google.com/drive/1qqVY0KxVvEFyifPgWpbyG2P5RdQj9_TZ?usp=sharing

Here are a list of pointers of what I did:

I also noticed that under Changing chunk shapes (rechunking) in Tutorial.rst, in 2nd code block, there's an apstrophe error.

a = zarr.zeros((10000,10000), chunks=(10000, 1), dtype='uint16, store='a.zarr')

This should be:

a = zarr.zeros((10000,10000), chunks=(10000, 1), dtype='uint16', store='a.zarr')

Some problems I faced:

Please let me know if there any suggested changes that are to be made. Thank you!

joshmoore commented 2 years ago

Repeating here from a recent chat with @sudoyolo: looking forward to seeing the notebook as a PR. It will need some additional work to integrate it into the readthedocs output. :+1:

rabernat commented 2 years ago

In order to integrate the notebook into the docs, I highly recommend myst-nb. We use it on all our documentation sites. It supports all of the fancy sphinx syntax in markdown via myst.

GbotemiB commented 1 year ago

Hi @joshmoore, I think this issue has been resolved on issue #996. you can as well close the issue.

jakirkham commented 1 year ago

That PR unfortunately appears to be closed (as opposed to merge). Contributions here would still be welcome 🙂

steph237 commented 1 year ago

Hi @joshmoore and @MSanKeys963 I am an outreachy applicant and I will like to work on this issue

jakirkham commented 1 year ago

Hi Stephanie, thanks for offering to help! 🙏

Think Emmanuel already picked this up with PR ( https://github.com/zarr-developers/zarr-python/pull/1163 ), but please feel free to grab another issue.

Thanks again! 🙂