xarray-contrib / xbatcher

Batch generation from xarray datasets
https://xbatcher.readthedocs.io
Apache License 2.0
167 stars 27 forks source link

Add sphinxcontrib-apidoc for automatic API docs #116

Open djhoese opened 2 years ago

djhoese commented 2 years ago

Description of proposed changes

This is an initial attempt at adding automated generation of API docs using the sphinxcontrib-apidoc package. This is something I've used in other packages and really takes away the maintenance burden of having to re-do the api.rst autodoc module every time something changes in the code. Every time sphinx docs are generated it runs sphinx-apidoc for you, puts files in an api/ directory, then continues on generating the docs.

Fixes #53

Concerns I've noticed:

  1. The api landing page isn't as "pretty" or simple as it is in main currently:

image

But it also includes everything in the package so in my opinion that is worth it. It is reference documentation, not a how-to.

  1. There seem to be other parts of the source code that are trying to re-define the API sections for some classes:
/home/davidh/repos/git/xbatcher/xbatcher/loaders/torch.py:docstring of xbatcher.loaders.torch.MapDataset:1: WARNING: duplicate object description of xbatcher.loaders.torch.MapDataset, other instance in api/xbatcher.loaders, use :noindex: for one of them
/home/davidh/repos/git/xbatcher/xbatcher/loaders/torch.py:docstring of xbatcher.loaders.torch.MapDataset.__init__:1: WARNING: duplicate object description of xbatcher.loaders.torch.MapDataset.__init__, other instance in api/xbatcher.loaders, use :noindex: for one of them
/home/davidh/repos/git/xbatcher/xbatcher/loaders/torch.py:docstring of xbatcher.loaders.torch.IterableDataset:1: WARNING: duplicate object description of xbatcher.loaders.torch.IterableDataset, other instance in api/xbatcher.loaders, use :noindex: for one of them
/home/davidh/repos/git/xbatcher/xbatcher/loaders/torch.py:docstring of xbatcher.loaders.torch.IterableDataset.__init__:1: WARNING: duplicate object description of xbatcher.loaders.torch.IterableDataset.__init__, other instance in api/xbatcher.loaders, use :noindex: for one of them
/home/davidh/repos/git/xbatcher/xbatcher/loaders/keras.py:docstring of xbatcher.loaders.keras.CustomTFDataset:1: WARNING: duplicate object description of xbatcher.loaders.keras.CustomTFDataset, other instance in api/xbatcher.loaders, use :noindex: for one of them
/home/davidh/repos/git/xbatcher/xbatcher/loaders/keras.py:docstring of xbatcher.loaders.keras.CustomTFDataset.__init__:1: WARNING: duplicate object description of xbatcher.loaders.keras.CustomTFDataset.__init__, other instance in api/xbatcher.loaders, use :noindex: for one of them
djhoese commented 2 years ago

Ok the warnings in the original comment were a false alarm. It was complaining because I had the old api.rst file lying around and it was loading that and the new API docs. Let's see how the docs build go.

@maxrjones this is ready for review.

djhoese commented 2 years ago

Wow, what good questions. And ones I've never asked for my other projects. Let's see what we can come up with:

Top-level imports

This is a good point and I've thought about it a lot since you commented. I think my overall feeling (which doesn't matter as I'm not the maintainer of this library) is that the API docs should reflect what the actual layout of the code. So put another way, the API docs should show readers where the code lives and what it has to offer (the hard facts of arguments, types, names, etc). That said, I'm always annoyed when I find out years later that I could have been importing things in a much simpler way...but maybe that isn't the job of the API docs to tell me that?

Here's what I've found so far: if we want to get really custom with stuff we can create templates:

https://www.sphinx-doc.org/en/master/man/sphinx-apidoc.html#cmdoption-sphinx-apidoc-0

and control how each type of object is created and documented. Hopefully we don't need that though.

The other thing to play with is imported-members described here:

https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#module-sphinx.ext.autodoc

where it says:

In an [automodule](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-automodule) directive with the members option set, only module members whose __module__ attribute is equal to the module name as given to automodule will be documented. This is to prevent documentation of imported classes or functions. Set the imported-members option if you want to prevent this behavior and document all available members. Note that attributes from imported modules will not be documented, because attribute documentation is discovered by parsing the source file of the current module.

This might let these be documented further up, but I haven't tried it yet. But I'm a little worried that doing this for all modules will likely cause more problems than it helps.

If you or anyone knows of a project that has the API documented the way you want let me know and I'll see if I can replicate it.

Accessors

I think the best idea would be to emphasize the usage of the class in the docstring for the class. Like an "Examples" section and a note that this class does not need to be used directly. It doesn't look like xarray's .plot accessor does anything special regarding documentation:

https://docs.xarray.dev/en/stable/generated/xarray.DataArray.plot.imshow.html

but instead sticks with the other non-reference documentation to spell out the usage:

https://docs.xarray.dev/en/stable/user-guide/plotting.html#simple-example

I guess if you mention in the docstrings that they don't need to be used directly and should be used as properties on xarray objects and they are never shown with direct usage then maybe it is OK?

https://docs.xarray.dev/en/stable/user-guide/plotting.html#simple-example

Edit: The other option could be to define the classes with a _ prefix in which case they would not be included in the API docs by default and also gives the reader the understanding that these aren't meant for direct usage.