pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.5k stars 1.04k forks source link

DOC: from examples to tutorials #3564

Open rabernat opened 4 years ago

rabernat commented 4 years ago

It's awesome to see the work we did at Scipy2019 finally hit the live docs! Thanks @keewis and @dcherian for pushing it through.

Now that we have these more detailed, realistic examples, let's think about how we can take our documentation to the next level. I think we need TUTORIALS. The examples are a good start. I think we can build on these to create tutorials which walk through most of xarray's core features with a domain-specific datasets. We could have different tutorials for different fields. For example.

Each tutorial would cover the same core elements (loading data, indexing, aligning, grouping, computations, plotting, etc.), but using a familiar, real dataset, rather than the generic, made-up ones in our current docs.

Yes, this would be a lot of work, but I think it would have a huge impact. Just raising here for discussion.

xref #2980 #2378 #3131

choldgraf commented 4 years ago

In case it's helpful for inspiration, we took a similar approach with the MNE-Python package (neuro electrophysiology package):

https://mne.tools/stable/index.html

Maybe there are at least 3 levels in there, actually:

Does that make sense?

TomNicholas commented 4 years ago

@rabernat I'm going to be making a simple plasma physics-oriented xarray tutorial to give at a workshop next week.

I was wondering - if we're uploading real data for these, how big can/should the files be? It might affect what dataset I use.

keewis commented 4 years ago

https://www.divio.com/blog/documentation/ might be a useful reference for this?

rabernat commented 4 years ago

if we're uploading real data for these, how big can/should the files be? It might affect what dataset I use.

This is a good question. We need the tutorials to be able to run and build within a CI environment. That's the main constraint.

For larger datasets, rather than storing them in github, a good approach is to create an archive on https://zenodo.org/ from which the data can be pulled.

TomNicholas commented 4 years ago

Maybe there are at least 3 levels in there, actually...

The article linked by @keewis is well worth reading in my opinion - it describes a similar breakdown of different types of documentation:

I think for xarray there is another type, like you suggest @choldgraf:

I personally think xarray in general has reference nailed, lots of good explanation, but is generally a bit weaker on tutorials and how-to guides, and doesn't have many examples of domain use-cases.


I have some ideas for how-to's (maybe these should all go in a separate issue?):


We need the tutorials to be able to run and build within a CI environment.

So @rabernat for small datasets what might be an appropriate max filesize? I literally have no idea. ~1MB?

a good approach is to create an archive on https://zenodo.org/

I'll look into that.

choldgraf commented 4 years ago

For larger datasets, rather than storing them in github, a good approach is to create an archive on zenodo.org from which the data can be pulled.

Another note from MNE - we have a "datasets" sub-module that knows how to pull a few datasets from various online repositories (and in different structures). These store in a local folder (by default, ~/mne_data I believe) and then they get fast-loaded after the first download. Many of the datasets are then stored in online repositories like OSF (https://osf.io/rxvq7/).

For datasets that aren't gigantic it's a pretty nice system. https://mne.tools/stable/overview/datasets_index.html?highlight=datasets

apkrelling commented 3 years ago

Hello everyone, is this issue still relevant? I could add a domain-use case for oceanography or meteorology, but it seems like that has already been done under

1) So there's no need to work on domain-use cases for oceanography or meteorology, is that correct?

2) Also, I'd be happy to contribute with something about how to migrate from numpy to xarray, if that is still needed.

dcherian commented 3 years ago

Hi @apkrelling thanks for offering to help!

I think we can still add more domain-specific examples for meteorology and oceanography. @rabernat had some plans for this, maybe he can describe them.

how to migrate from numpy to xarray, if that is still needed.

This would be totally great!

hafez-ahmad commented 3 years ago

Hey everyone !

is there any way to change or reorder month names [ 'DJF' 'JJA' 'MAM' 'SON'] during seasonal grouping? I like to change 'DJF' 'JJA' 'MAM' 'SON' combination and find out winter season Dec+Jan+Feb+Mar=winter season.

Your assistant highly appreciated.

dcherian commented 3 years ago

@hafez-ahmad can you ask this question in Discussions? https://github.com/pydata/xarray/discussions

dcherian commented 2 years ago

We've started discussing how to reorganize the xarray-tutorial repository here: https://github.com/xarray-contrib/xarray-tutorial/issues/53 . Comments are welcome!

alimanfoo commented 1 year ago

Hi folks,

Just to mention that we've created a short tutorial on xarray which is meant as a gentle intro to folks coming from the malaria genetics field, who mostly have never heard of xarray before. We illustrate xarray first using outputs from a geostatistical model of how insecticide-treated bednets are used in Africa. We then give a couple of brief examples of how we use xarray for genomic data. There's video walkthroughs in French and English:

https://anopheles-genomic-surveillance.github.io/workshop-5/module-1-xarray.html

Please feel free to link to this in the xarray tutorial site if you'd like to :)

ddjustina commented 1 year ago

In case it's helpful for inspiration, we took a similar approach with the MNE-Python package (neuro electrophysiology package):

https://mne.tools/stable/index.html

Maybe there are at least 3 levels in there, actually:

* **Examples** - short vignettes that highlight one very specific piece of functionality, key-words for the example should be `ctrl-f`able in the title

* **Tutorials** - in-depth guides through a common part of workflow that xarray wishes to enable, with more explanation and detail

* **Domain use-cases** - examples of how xarray can facilitate use-cases in particular fields. Probably cover at a high-level many of the steps that multiple tutorials cover in-depth. More for "inspiration and buy-in" than in-depth learning.

Does that make sense?

@choldgraf seems like this page is down (https://predictablynoisy.com/xarray-explore-ieeg). Are these examples available elsewhere?

choldgraf commented 1 year ago

Oops I think the url just changed

https://chrisholdgraf.com/blog/2019/2019-10-22-xarray-neuro/