mkdocstrings / autorefs

Automatically link across pages in MkDocs.
https://mkdocstrings.github.io/autorefs/
ISC License
51 stars 8 forks source link

Allow fallback to return multiple identifiers to try #11

Closed pawamoy closed 2 years ago

pawamoy commented 3 years ago

Currently, when autorefs doesn't know the URL for an identifier, it falls back to collecting the object through every handler (and getting an anchor through its renderer) until an anchor is found.

# get_item_url method in autorefs plugin
try:
    url = self._url_map[identifier]
except KeyError:
    if identifier in self._abs_url_map:
        return self._abs_url_map[identifier]

    if fallback:
        new_identifier = fallback(identifier)
        if new_identifier:
            return self.get_item_url(new_identifier, from_url)

    raise

if from_url is not None:
    return relative_url(from_url, url)
return url
# fallback method in mkdocstrings Handlers
for handler in self._handlers.values():
    try:
        anchor = handler.renderer.get_anchor(handler.collector.collect(identifier, {}))
    except CollectionError:
        continue
    if anchor is not None:
        return anchor
return None

It works well when canonical objects are rendered (the ones with canonical paths, i.e. the identifier corresponding to where they are defined, not imported/aliased), but not the other way around.

Example:

The other way around:

So, what we need here is one of two things, or both:

  1. the fallback method returns multiple identifiers that autorefs can test
# get_item_url method in autorefs plugin
...
    if fallback:
        new_identifiers = fallback(identifier)
        for new_identifier in new_identifiers:
            with contextlib.suppress(KeyError):
                return self.get_item_url(new_identifier, from_url)
        raise
...
# fallback method in mkdocstrings Handlers
for handler in self._handlers.values():
    try:
        anchors = handler.renderer.get_anchors(handler.collector.collect(identifier, {}))
    except CollectionError:
        continue
    if anchors:
        return anchors
return []
  1. autorefs somehow updates its URL map with alias->canonical and canonical->alias as soon as possible. For example, when we render a canonical object, update the URL map with all the alias paths mapping to the canonical path (depends on the ability of the collector/collected data). When we render an alias, update the URL map with the canonical path mapping to the alias path. Of course if you render the same object, alias or canonical, multiple times, the last iteration wins as it will override mappings in the URL map. But we can consider it an invalid use-case, or a limited one.

The option 2 would also prevent redundant, costly fallbacks. Maybe this can already be improved by updating the URL map when finding a new valid anchor/identifier.

# get_item_url method in autorefs plugin
...
    if fallback:
        new_identifiers = fallback(identifier)
        for new_identifier in new_identifiers:
            with contextlib.suppress(KeyError):
                url = self.get_item_url(new_identifier, from_url)
                self._url_map[identifier] = url  # update the map to avoid doing all this again
                return url
        raise
...

Maybe we could call the renderers get_anchor[s] method early, in mkdocstrings.extension.AutodocProcessor.run and mkdocstrings.extension.AutodocProcessor._process_block, to register more anchors into autorefs.

# lets say _process_block also returns the object (data)
html, handler, data = self._process_block(identifier, block, heading_level)
...
for anchor in handler.renderer.get_anchors(data):
    self._autorefs.register_anchor(page, anchor)
oprypin commented 3 years ago

Oops I thought this was a pull request 😅 But I'll definitely check this out

pawamoy commented 3 years ago

Thanks! I'll experiment anyway, and send a PR if I get something elegant :slightly_smiling_face:

oprypin commented 3 years ago

we render Class with one of its alias path (there could be many), autorefs maps this alias path to its URL (it should not use the canonical path, because the alias is the public one, that's what the user wants)

Hmm I've certainly never planned for this case, I thought everything should always have unambiguous canonical path. That is, perhaps I straight up disagree with "it should not use the canonical path". If the handler wants to do shenanigans with "canonical aliases", perhaps it should somehow consistently report one chosen path.

the corresponding handler collects pkg.base.Class, and return its canonical path: pkg.base.Class

Well no, the returned canonical path should be pkg.Class, that's what the user wants.


That's all I have to say for now and probably won't actually be able to look into this in detail. I don't know if this is useful, but maybe it'll lead you to some interesting thought.

pawamoy commented 3 years ago

Sure, your input is always welcomed :slightly_smiling_face:

That is, perhaps I straight up disagree with "it should not use the canonical path"

I thought about it, and yes, perhaps it could use both, which corresponds to the proposed solutions, but not the current situation.

Well no, the returned canonical path should be pkg.Class, that's what the user wants.

True as well, but considering an object can have multiple aliases, the handler has no way of knowing which one of the aliases the object is registered with.

Of course I've very biased toward Python, which certainly does not mirror other languages concepts. But I guess populating the URL map with more data would not be harmful anyway.

oprypin commented 3 years ago

True as well, but considering an object can have multiple aliases, the handler has no way of knowing which one of the aliases the object is registered with.

I'm not sure if this is fully true.

The handler could perhaps memorize the first identifier it was collected through and use that henceforth.

But that would probably be considered more hacky than what you're suggesting..

pawamoy commented 3 years ago

Yeah there are multiple options here!