mkdocstrings / griffe

Signatures for entire Python programs. Extract the structure, the frame, the skeleton of your project, to generate API documentation or find breaking changes in your API.
https://mkdocstrings.github.io/griffe
ISC License
255 stars 37 forks source link

feature: extension(?): relative cross-references in docstrings, but resolved by the handler/extractor #264

Open the-13th-letter opened 2 weeks ago

the-13th-letter commented 2 weeks ago

Is your feature request related to a problem? Please describe.

I find myself repeatedly writing (mkdocs/mkdocstrings-formatted) documentation of the following form, where I have to reference other functions doing things that affect my current function:

# Google style code, not just documentation
from my.awesome.package import configuration

"""Do the thing.

...

Raises:
    RuntimeError:
        Obtaining the configuration via [`get_config`][my.awesome.package.configuration.get_config]
        resulted in an error.

"""
# need to specify full path :(

i.e. where the link to myawesome.package.configuration.get_config needs to be spelled out in full. Generally this happens when things like enum.Enums and typing.NamedTuples come into play, and these get moved to separate packages but constantly must be referenced.

Describe the solution you'd like

mkdocstrings, the main benefactor of Griffe, and its parent system mkdocs, use Markdown, which by design is intended to be readable in its source form. Python comes with docstring support and a built-in help function, i.e. a place for documentation to be placed and a system to query it, so that it is easy to read and write documentation in its source form. My point being, the source form of the docstring should be readable.

Therefore, in (my best interpretation of) the spirit of Google's style guide and of Markdown's design rationale, I imagine the cleanest way to specify this cross-reference would be:


# global value declared here
from my.awesome.package import configuration

...

"""Do the thing.

...

Raises:
    RuntimeError:
        Obtaining the configuration via [`configuration.get_config`][]
        resulted in an error.

"""
# link target resolved as per the global value above

i.e. that cross-references can be specified using local object names that the Python environment already knows about (here: the myawesome.package.configuration module via the configuration alias). This is very readable even with Python's help function (and in mkdocs-generated documentation as well, of course), and it is intuitively clear for any Python programmer how to change this cross-reference if a different object needs to be referenced, or if anything has been renamed or imported under a different name or whatnot. It is also based on information that Griffe probably already extracted when resolving type hints or the like, so Griffe (and thus mkdocs) could likely easily be taught to “understand“ such links.

When using this notation, there would only be a need to format cross-references of the form [`identifier`][] or [`dotted.path`][], i.e. where the link text and the link target are equal, and are both valid Python identifiers. I imagine that this is unambiguous enough to have a very low false-positive rate, so it should be “relatively safe“ to treat every such equal-text-and-target link which is also a valid Python identifier as a cross-reference to resolve this way.

From my understanding of the architecture of mkdocs + mkdocs-autorefs + mkdocstrings-python, I believe there are two parts to this proposal:

  1. Griffe (or a Griffe extension) should check if the current docstring contains a Markdown cross-reference of the aforementioned type. If we detect such a reference, Griffe should either resolve it directly (using the names from the module it already knows about) or else add enough information (e.g. names of possible candidates) to some auxiliary structure to help resolve it later.

  2. mkdocs-autorefs (or an extension) should resolve the unresolved link target using the auxiliary structure, if it hasn't been resolved yet.

Describe alternatives you've considered

I'm aware of mkdocstrings/python#27, mkdocstrings/python#28 and of analog-garage/mkdocstrings-python-xref. I can't speak to the quality of the code, but I find the proposed syntax for relative identifiers especially repulsive, in a this-is-no-longer-text-this-is-code sort of way. For me this is no better for reading or writing than using full package names.

I'm aware of mkdocstrings/autorefs#37. Again, I really detest the proposed syntax, as it goes against the spirit of having documentation readable and understandable even in source form.[^1] I also disagree with the notion that the current object . in a relative link should be the class/function/whatever that the docstring belongs to, because relative import syntax only considers packages and modules, and nested scopes use modifiers like nonlocal or global but otherwise unqualified identifiers. I expect forcing users to use unestablished-but-superficially-familiar syntax for actually-familiar things would also force them to constantly have to look up the syntax just to get it right, because they can't quickly tell if a ..ParentClass link in SubClass.method refers to the sibling class ParentClass the same module, or in the parent module, or even if the superclass is ParentClass itself.

(I'm also aware of @pawamoy's, @lmmx's and @oprypin's comments and viewpoints there. I'm hoping you will chime in on this proposal too, if only because I think the new limited scope and clear division of (software) responsibility could simplify that pull request quite significantly.)

(Also, I can't tell if the pull request is dead, or just on hold, and if your stances on that design have changed since then.)

[^1]: No, I don't find Python's relative import syntax really aesthetical either, and would not want it in user-facing non-code text. For what it's worth, Google's style guide also explicitly discourages its use, though for different reasons.

pawamoy commented 2 weeks ago

Thanks for this detailed feature request @the-13th-letter :slightly_smiling_face: I appreciate the thoroughness and the honest opinions.

Before answering, let me try and explain how cross-references work.

Now for my answer.

If we wanted to implement your suggestion, we would have to either:

The second option doesn't look good to me, because:

So that leaves us with option 1. I have read https://github.com/mkdocstrings/autorefs/pull/37 again and came to the conclusion that it's my preferred solution going forward. I'll expand there on the general debate over generic vs handler-specific handling of references, and stay focused here on the syntax you propose.

While I agree that using the current object's scope to resolve partial paths (like configuration.get_config mentioned above) looks good, and is consistent with what we do for types in docstrings, I worry about a few things.

First, the restrictions it brings. If we are to rely on the object's scope to resolve a partial path, that means the object we reference must actually exist in the given scope (either at runtime or just in the sources, guarded by if TYPE_CHECKING or similar conditions). For example, if you want to reference a function in a sibling module, you have to actually import this function, or its parent module, to be able to cross-ref it using its path. That will definitely not play well with linters/formatters that remove unused imports, since they wouldn't be able to detect that these imports are made to allow cross-refs in docstrings.

Second, and as you mentioned, there could be cases where the reference becomes ambiguous, because it's a complete path but this path also exists in the current scope. For example:

from official import thing
import third_party as official

def func():
    """Link to [official][] package.""""

Here it's unclear whether official refers to the official external package, or to the third_party external package. It would still be unclear if we removed the line from official import thing. I don't want to bake such uncertainty into the design.

The point of readability is important, but IMO the most readable references will always be the fully typed ones. Relative references, whether they use the current scope (not visible when displaying help(function)) or dots as prefix (requiring some mental gymnastics), will always be less readable. I would argue that at least the "dot-prefix" case allows to understand what is referred without printing and reading actual source code, contrary to the "scope" case.

pawamoy commented 2 weeks ago

I'll need a bit more time to answer on https://github.com/mkdocstrings/autorefs/pull/37, but the crux of it, so you know where I'm headed to, is the following:

jinja template with access to the object being rendered
    convert_markdown(object.docstring, object.metadata, **other_options)
        md.processors["autorefs"].metadata = metadata
        md.convert(docstring)
            autorefs extension working on unresolved links
                metadata.resolve(relative_id)
                metadata.lineno()
                inject metadata in span with HTML attributes

in autorefs, when transforming spans to HTML links
    use metadata accordingly

What this means is that the handler/extractor could then decide how to compute metadata, and what inputs it supports, meaning that your suggestion could actually be supported :slightly_smiling_face: For Python specifically, Griffe would probably be responsible for this metadata object, and since it supports extension, I can imagine an extension for your syntax suggestion, and other ones for the other suggested syntaxes.

Additionally, autorefs itself could propose a generic syntax, that handlers/extractors could choose to support or not, paving the way to a unified way of writing (relative/cross-language) references.

the-13th-letter commented 2 weeks ago

Before answering, let me try and explain how cross-references work.

  • The autorefs plugin registers a Markdown extension in the on_config hook.
  • When the mkdocstrings-python handler converts docstrings from Markdown to HTML, the extension registered by autorefs kicks in and transforms Markdown links [title][ref] into HTML spans with a data-autorefs-identifier attribute. It only transforms links that were left untouched by Python-Markdown itself because they didn't have their URL set.
  • Once all pages have been converted to HTML, the autorefs plugin searches for autorefs spans and replaces them with actual links, using an internal URL map that was built through previous events, and querying existing handlers for equivalent identifiers when it doesn't know one.

OK, got it.

Now for my answer.

If we wanted to implement your suggestion, we would have to either:

  • hook the autorefs Markdown extension and the Python handler together somehow, so that the extension can know what object is being rendered, and use it to resolve object references.
  • or as you suggest, have Griffe somehow inject additional information in Markdown links in docstrings

The second option doesn't look good to me, because:

  • modifying (not just extending) docstrings is dirty: we can tolerate it but only in last recourse
  • while the regular expression is not complex (we can reuse Python-Markdown's regex), it feels like we're short-circuiting the Markdown-to-HTML converter, when really all Markdown-to-HTML logic (even if just pre-processing) should live there and not in Griffe. Griffe is supposed to be both markup agnostic (Markdown/reStructuredText/Asciidoc/whatever) and SSG agnostic (MkDocs/Quarto/etc.). We do have a tiny bit of Markdown logic in our docstrings parsers, to avoid interpreting a few things in code blocks delimited by triple-backticks fences, but I would very much like to avoid adding more.

I'm not particularly hung up on implementing this in Griffe instead of in mkdocstrings-python; sorry if it seemed that way. I did feel however that implementing this entirely in autorefs, in a language-independent manner, leads to the aforementioned unaesthetic syntaxes and would require Python parsing knowledge (or at least scoping knowledge) in a system that is supposed to be language agnostic. Since Griffe already parses this information it made sense to me to place this related functionality there as well. Of course, given the coupling of mkdocstrings-python with Griffe, I imagine the effort to implement this in mkdocstrings-python instead is probably of similar magnitude.

So that leaves us with option 1. I have read mkdocstrings/autorefs#37 again and came to the conclusion that it's my preferred solution going forward. I'll expand there on the general debate over generic vs handler-specific handling of references, and stay focused here on the syntax you propose.

While I agree that using the current object's scope to resolve partial paths (like configuration.get_config mentioned above) looks good, and is consistent with what we do for types in docstrings, I worry about a few things.

First, the restrictions it brings. If we are to rely on the object's scope to resolve a partial path, that means the object we reference must actually exist in the given scope (either at runtime or just in the sources, guarded by if TYPE_CHECKING or similar conditions). For example, if you want to reference a function in a sibling module, you have to actually import this function, or its parent module, to be able to cross-ref it using its path. That will definitely not play well with linters/formatters that remove unused imports, since they wouldn't be able to detect that these imports are made to allow cross-refs in docstrings.

I'm inclined to say that having to actually import an object (or its containing module) to reference it in a short manner is a feature of this system. Or rather, if your actual code uses this name so little that you don't care enough to import it, then it's also very probably not important enough for you to want to document it using only a short local name.

And of course, there's always the option of overriding your linter/formatter in this case to keep the name. Or file this as a bug/wishlist feature for the linter/formatter to detect. I understand that it is currently treated as a bug on the author's side and not on the tool's side… but I don't think that that's actually true, or that it would necessarily stay this way in the long run.

Also, I'm totally fine with this being available only as an extension/a plugin that users need to explicitly opt in to.

Second, and as you mentioned, there could be cases where the reference becomes ambiguous, because it's a complete path but this path also exists in the current scope. For example:

from official import thing
import third_party as official

def func():
    """Link to [official][] package.""""

Here it's unclear whether official refers to the official external package, or to the third_party external package. It would still be unclear if we removed the line from official import thing. I don't want to bake such uncertainty into the design.

Yeah, well, that's just asking for trouble, and You Really Shouldn't Do That™. And if you actually Really Do That, then woe be to you for writing such horribly readable code. In this particular case I would probably link to the renamed third_party module, and maybe warn but otherwise accept that it is now impossible to refer to the official module even in absolute package names. You get what you deserve asked for.

There's only one scenario I can think of where I would tolerate this kind of thing: six and similar monkey patching of renamed, removed or otherwise maybe-non-existing functionality. But even there, tracing the value of the local name is unreliable enough that I don't feel bad for imposing the usage of full path names for cases like this.

(Also, during my last attempt at this type of monkey patching otherwise missing functionality—providing a poor substitute for sortedcontainers.SortedDict if sortedcontainers was unavailable—I spent almost all of my time fighting with the type checker instead. I don't think that fight would have been any less miserable if only my docstring formatter had allowed me to use short-form relative cross-references…)

On a related note, I would be fine with this being available only as an extension/a plugin that users need to explicitly opt in to.

The point of readability is important, but IMO the most readable references will always be the fully typed ones. Relative references, whether they use the current scope (not visible when displaying help(function)) or dots as prefix (requiring some mental gymnastics), will always be less readable. I would argue that at least the "dot-prefix" case allows to understand what is referred without printing and reading actual source code, contrary to the "scope" case.

Clearest, yes. Most readable, I dunno; I think that depends a lot on how unwieldy the full path is, and even if the package is known under a standard alternate name such as np for NumPy. I also believe it is permissible in writing to rely on context to disambiguate things that would on paper be slightly ambiguous. We do this all the time, particularly in technical writing and speaking.[^1] It also requires some discipline to use responsibly.

In the abovementioned case, yes, the (full) scope is not visible to the reader of the docstring, and it would be prudent of the author of this docstring to not relative-reference names that can't be clearly matched to enclosing scopes or parent modules. But I'm a strong believer of treating the user as an adult competent and not second-guessing or belittling them,[^2][^3] and I therefore really, really think that there should be some way for authors to tell mkdocs+autorefs+mkdocstrings-python+griffe “Yes, when I don't specify a full path, I absolutely 100% mean the Python object currently visible at that path. And I accept full responsibility for any messup because of wrong or non-existing objects at that path, and for unaddressable object paths.“.

And concerning the “dot-prefix“ syntax, I don't believe that the ability to be able to decode what the prefix refers to is that much consolation. For both systems, if you use them where you ought to better have used an absolute cross-reference, you'll probably be bogged down by trying to decode just what in the world that crossref is referencing. If they are both used where they make sense, then the “scope“ version probably needs less lexical parsing than the “dot-prefix“ version. I imagine that you could get used to both of them, given enough exposure. That said, from my own experience with (not) using Python's relative import syntax, I'm pretty sure that I would need to do calculations and double-checks to make sure my identifiers are correct in the “dot-prefix“ version. I really can't imagine anyone needing that for the “scoped” version but not needing it for the “dot-prefix” version.

Again, I realize that I am asking for a certain implementation that only works for some kinds of relative cross-references, that only works correctly[^4] for some kinds of relative crossrefs, and that there is a lot of potential both for ambiguity in what the user actually meant and for the reference texts to be non-sensical to readers consulting the raw docstring without having access to the rest of the source code (e.g. help(...)). If this only ever makes it to a non-default, configurable option, or perhaps an extension/a plugin, I'd be happy with that too.

Also, btw, did I perhaps ever mention that I'm completely fine with this being available only as an extension/a plugin that users need to explicitly opt in to…?

[^1]: Git the DVCS vs. git the slang term for other people. Python the language vs. python the animal. Keys in database systems vs. keys in cryptography. Et cetera. [^2]: I'm told this is part of the UNIX philosophy, and supposedly the main reason why commands like rm do not by default prompt you if you really want to delete this file you just said you want to delete. [^3]: While it is OK to include safeguards against potentially destructive behavior (such as the aforementioned deleting of files), it turns into belittling the user if it is hard or impossible to bypass this safeguard. [^4]: …as in, only sometimes resolves the reference the way the user intended it to be resolved.

pawamoy commented 1 week ago

If I understand correctly, you're fine with this being available only as an extension... :grinning:

Good then, this is likely what will happen :slightly_smiling_face:

treating the user as an adult competent

Just a quick maintainer perspective: there can be a lot of less-advanced users that will trip over anything that can be tripped over. That generally means more user support falling down onto maintainers (helping them, turning down feature requests, closing invalid bugs, writing non-confusing documentation, putting more efforts into warning messages and error handling). That's something to take into account.

pawamoy commented 1 week ago

Also, thanks again for your suggestion. Using the object's scope is something I didn't think about before and it's a nice idea :smile:

the-13th-letter commented 1 week ago

If I understand correctly, you're fine with this being available only as an extension... 😀

Good then, this is likely what will happen 🙂

Thanks :)

treating the user as an adult competent

Just a quick maintainer perspective: there can be a lot of less-advanced users that will trip over anything that can be tripped over. That generally means more user support falling down onto maintainers (helping them, turning down feature requests, closing invalid bugs, writing non-confusing documentation, putting more efforts into warning messages and error handling). That's something to take into account.

Surely. And maybe my view on this will change once I have done some (serious) software maintenance work. But as of now, I am shaped much more by my repeated experiences at $WORK with software that never lets you outgrow the “you're probably new to this, let me hide any dangerous settings that would only confuse you” phase, because it is programmed that way or because the administrative policy demands it that way. Which may or may not show in what I feature-request, and how.