mkdocstrings / python

A Python handler for mkdocstrings.
https://mkdocstrings.github.io/python
ISC License
180 stars 33 forks source link

Show __init__ exports #39

Open JP-Ellis-KPMG opened 2 years ago

JP-Ellis-KPMG commented 2 years ago

Problem

The __init__.py is sometimes used to re-export functions and classes defined within sub-modules, generally for visibility within the library and also with the auto-completion. It would be nice if the documentation showed this as well.

Solution

Ideally, I would like an option that shows explicit imports/re-exports located within __init__.py. Based on other linting conventions, the way this is done is to be explicit with an as even if it is tautological, though may not always be.

# Explicit import and re-export without changing the name
from .foo import Bar as Bar
# Explicit import and re-export with a name change
from .foo import Baz as FooBaz

As for the implementation, I like the way Rust documentation shows re-exports in a section dedicated for re-exports. This makes it clear that the class or function is defined elsewhere but has been made available within the current namespace. Here's an example from the popular Clap library:

https://docs.rs/clap/latest/clap/#reexports

Alternatives

I tried looking at the options available, but could not identify anything that would provide the functionality I need.

Another alternative would be to have the gen-files script handle this, but the implementation does not appear to be easy.

Extra

The ability to document re-exports I think could be implemented with the following options if there might be a need for that, though I struggle to see the use-case for any of them other than explicit-init.

llucax commented 4 months ago

BTW, there are official rules for what's considered a re-exporting of symbols, which is followed by mypy and pyright for example: https://github.com/python/typing/blob/main/docs/source/libraries.rst#library-interface-public-and-private-symbols

llucax commented 4 months ago

This applies not only to __init__ though, so not sure if I should create a separate issue or not.

pawamoy commented 4 months ago

Related:

llucax commented 4 months ago

There was also some parallel discussion in gitter.

@pawamoy said:

Hey :) indeed currently Griffe only marks as "exported" the names listed in __all__. When checking if an object is considered public, these "exports" are then used to tell whether an object is explicitly exported. If there are no exports (__all__ is undefined), Griffe falls back on an implicit check, and the mere presence of an object is enough (given it is not private, i.e. it doesn't start with _).

I initially decided not to consider "from x import a as a" as explicitly exported, because:

  1. I don't like it :D
  2. It's ugly :D
  3. More seriously, we have one obvious way: __all__.
  4. "as" imports makes you noqa: F401 the imports, while __all__ is understood. Maybe linters can support "as" imports though

In any case, I didn't think of everything, so this can be revisited / improved. I don't obviously agree with Mypy's rules though. Mostly, I want to have something that makes sense, without confusing users (too much?) :)

So I have mainly 3 reasons why I think this should be supported despite your reasons:

  1. It is not only a mypy thing, it is part of the typing PEP484 (even if it is mentioned only for stub files), this is an official thing, ignoring it doesn't make it go away.
  2. If the goal is not confusing users, not implementing it doesn't help that goal, I was very confused that my symbols weren't shown in the docs when I re-exported it in an officially supported way :)
  3. There is one case that, AFAIK, can't be handled gracefully by using __all__: from x import *. For __init__.py this makes a lot of sense in many cases, having to explicitly import and re-export every symbol in every sub-module is not only pure bloat, it is also error-prone, as if you create more symbols in a sub-module, you need to remember to go to the __init__.py and re-export it. Unless I'm missing some technique to deal with this case more gracefully, in which case I would love to hear about it, if not as a workaround until from x import * is supported by mkdocstrings ;)

About other linters not supporting it, yes, I agree it is annoying, but because of the reasons above, I think they should be, and I'm willing to open issues in flake8 and pylint if they can't deal with this properly.

pawamoy commented 4 months ago

Thanks for all the info!

Just to play devil's advocate a bit (don't read it as "I really don't want to support as/* imports", it's just to bring nuance):

https://github.com/python/typing/blob/main/docs/source/libraries.rst#library-interface-public-and-private-symbols Type checkers can use the following rules to determine which symbols are visible outside of the package.

Emphasis on "can". To me it sounds like type checkers have a choice in how they consider an object to be public/exported or not. If they want, they can follow these few rules described in the document, but apparently they don't have to.

https://peps.python.org/pep-0484/#stub-files

This explicitly says it's for stub files. The only mention of non-stub files is:

Just like in normal Python files, submodules automatically become exported attributes of their parent module when imported.

So to me, this again is not an authoritative document to describe how to determine whether an object is public/exported or not. I don't think such a document exist yet.


Anyway. Even if we don't have specs, we have conventions, and from a import x as x and from b import * seem to be rather common conventions to mark x and all public objects from b as public. So let's try and support them :slightly_smiling_face:

About your third point, there is a way: if you define __all__ in your submodules, you can import it to extend a parent module __all__ list. Example:

package/
  __init__.py
  module.py
# module.py
__all__ = ["X", "Y", "Z"]
...
# __init__.py

# __all__ was by the way specifically made for wildcard imports 
from module import *

# we could also avoid using the wildcard import,
# but I understand it's less convenient: from module import X, Y, Z

# import module's __all__!
from module import __all__ as _module_all

# use it to extend your parent __all__
__all__ = ["A", "B", "C", *_module_all]

# other operations are supported like:
# __all__ += _module_all

About the error-proneness of having to remember to explicitly import and add to __all__, I'd argue that automatically importing and re-exporting everything with wildcard imports is even more error-prone, and much less clean from a public API point of view. I've seen monsters while working on Griffe, believe me. Explicit is better than implicit :blush:?

Anyway, I think wildcard imports are OK when combined with __all__.

pawamoy commented 4 months ago

One more argument in favor of using __all__ when using wildcard imports.

If you use wildcard imports without __all__ to restrict the set of objects made available in the importing module, you will expose more objects than what the conventions suggest. The convention is only respected by static analysis tools, not runtime interpreters!

package/
  __init__.py
  module.py
import third_party  # by convention, not exported

class A:  # by convention, exported
    ...
# __init__.py
# by convention, only supposed to export A,
# but at runtime, third_party is exposed too!
from module import *
llucax commented 4 months ago

Anyway. Even if we don't have specs, we have conventions, and from a import x as x and from b import * seem to be rather common conventions to mark x and all public objects from b as public. So let's try and support them πŸ™‚

:tada:

About your third point, there is a way: if you define __all__ in your submodules, you can import it to extend a parent module __all__ list.

Yeah, I thought of that, but that just means now I need to move all the duplication to the submodules, not that I get rid of the duplication. I can already mark symbols private in submodules by prefixing them with _, a very widely-used convention (and also supported by all major type-checkers and linters).

If I can't get rid of the duplication, I rather have it in the __init__.py file which is much more meta and avoid noise in the files that actually have the useful code.

But thanks for the suggestion!

About the error-proneness of having to remember to explicitly import and add to all, I'd argue that automatically importing and re-exporting everything with wildcard imports is even more error-prone, and much less clean from a public API point of view. I've seen monsters while working on Griffe, believe me. Explicit is better than implicit 😊?

I agree and can see that for monster legacy projects doing this is madness, but for a small greenfield project trying to use modern Python features, I think for __init__.py files, it makes a lot of sense and makes things much simpler, again, just mark anything you don't want to export in a module as private and that's it, that is also better than having everything public anyway. If something is not intended to be public it shouldn't be in the first place :)

llucax commented 4 months ago

If you use wildcard imports without __all__ to restrict the set of objects made available in the importing module, you will expose more objects than what the conventions suggest. The convention is only respected by static analysis tools, not runtime interpreters!

True. For the cases where you use a type-checker, you code will never run in runtime, so that issue should never happen too. This mostly applies to all static checks performed by a type-checker, unless you add runtime ininstance() checks for every type hint you use, your code will still explode at runtime, so even when I see your point, when using type checkers I see this as a non-issue (maybe in terms of performance, I'm not sure if doing import * could have some implication.

In any case, if this feature is controversial, or could be useful only to a subset of users, maybe it can be added behind an option to let users decide what's best for them.

pawamoy commented 4 months ago

Ha, this is not an easy topic. So many things to consider here.

Let me try to start again. First I'll describe what Griffe currently does. Then I'll share a few observations.


Griffe has a concept of "exported member". A module member is considered:

A class member is considered implicitly exported, the same way as module members are considered implicitly exported. It is never considered explicitly exported because we don't/can't use __all__ in class bodies.

This "exported" marker is then used in a few places, 3 of of them being of interest to us here.

1. when expanding wildcard imports

I see a first issue here: members starting with _ are considered implicitly exported. This is wrong: actual wildcard imports will not import these objects at runtime.

2. when checking for breaking changes in the API

Between the same old and same new module, if a member disappeared, and it was either a submodule, or it was exported (explicitly or implicitly), we consider its disappearance a breaking change. Again, it's wrong here to consider that a private object's disappearance makes a breaking change. I didn't spot this issue before because the code that checks breaking changes has an earlier condition on the object names: it skips private objects anyway.

3. when telling if an object is public

Griffe objects have an is_public() method that tell if an object is public or not. An object is:

Griffe gained this method later on, to be used by mkdocstrings-python.


A few observations.

I think Griffe's internals/API is a bit confusing. "Exported" is usually used to say "public" (we export an object to make it public), but in Griffe it's rather used to say "the object is exposed in regard to the wildcard import runtime mechanism". So maybe I should rename "exported" to "wildcard exposed".

Then we should fix the issue mentioned above about private members being considered implicitly exposed. And since wildcard imports are not configurable, we would drop the distinction between explicitly and implicitly exposed. An object is either exposed, or it's not.

Finally the code that checks breaking changes should use the "public" API instead of the "exposed" one.


Now to go back to the subject at hand: whether to support the two conventions from a import x as x and from a import *. These conventions ask us to consider x as public, and all objects imported from a (following regular wildcard import mechanisms) public too, respectively.

Thanks to the above analysis, and the changes we will make, I think it is safe to support them :+1:

My comment above about convention vs runtime was just me being confused, or the official docs being unclear. "Imported by wildcard" and "considered public" are not the same thing, and we most probably don't ask type checkers to use the "public" logic when expanding wildcard imports.

Just a final note: Griffe is not a type-checker, so type-checker recommendations do not always apply to us. In this case it seems like we can apply them safely :+1:

llucax commented 4 months ago

Thanks for the very in-depth analysis! Indeed it seems like there is a lot to consider, but I'm glad it seems like these conventions fit into the current Griffe model. In particular I'm not sure I understand the difference between public, exported and exposed are this context.

Also now I'm curious about what this BCD you are talking about is! Is is a tool that one can use to check if some changes break the public interface? Or is it just some internal implementation detail from Griffe that's not exposed to "regular users"?

Just a final note: Griffe is not a type-checker, so type-checker recommendations do not always apply to us. In this case it seems like we can apply them safely πŸ‘

I understand technically this is true, but if your documentation doesn't match what the type checker thinks, this documentation won't be all that useful. :innocent:

pawamoy commented 4 months ago

I understand technically this is true, but if your documentation doesn't match what the type checker things, this documentation won't be all that useful. πŸ˜‡

Of course :smile: (Just to give one example, type-checkers will apparently only read stubs, while we have to read both regular modules and their stubs, then merge their contents :slightly_smiling_face:)

Also now I'm curious about what this BCD you are talking about is! Is is a tool that one can use to check if some changes break the public interface? Or is it just some internal implementation detail from Griffe that's not exposed to "regular users"?

Glad you asked :sunglasses: It's definitely exposed to users, see our docs here: https://mkdocstrings.github.io/griffe/checking/ :relaxed:

llucax commented 4 months ago

This is amazing, thanks a bunch!

llucax commented 4 months ago

Just for completeness, here is another more or less official mention to how public and private symbols should work for libraries: