Allow fallback to return multiple identifiers to try

pawamoy commented 3 years ago

Currently, when autorefs doesn't know the URL for an identifier, it falls back to collecting the object through every handler (and getting an anchor through its renderer) until an anchor is found.

# get_item_url method in autorefs plugin
try:
    url = self._url_map[identifier]
except KeyError:
    if identifier in self._abs_url_map:
        return self._abs_url_map[identifier]

    if fallback:
        new_identifier = fallback(identifier)
        if new_identifier:
            return self.get_item_url(new_identifier, from_url)

    raise

if from_url is not None:
    return relative_url(from_url, url)
return url

# fallback method in mkdocstrings Handlers
for handler in self._handlers.values():
    try:
        anchor = handler.renderer.get_anchor(handler.collector.collect(identifier, {}))
    except CollectionError:
        continue
    if anchor is not None:
        return anchor
return None

It works well when canonical objects are rendered (the ones with canonical paths, i.e. the identifier corresponding to where they are defined, not imported/aliased), but not the other way around.

Example:

pkg.base defines Class
pkg imports pkg.base.Class as Class
therefore pkg.Class is an alias to pkg.base.Class
we render Class with its canonical path, autorefs maps its canonical path to its URL
now when autorefs fixes a [Class][pkg.Class] crossref, it falls back as mentioned above
the corresponding handler collects pkg.Class, and returns its canonical path: pkg.base.Class
great, autorefs knows this identifier, it can fix the crossref

The other way around:

(same) pkg.base defines Class
(same) pkg import pkg.base.Class as Class
(same) therefore pkg.Class is an alias to pkg.base.Class
we render Class with one of its alias path (there could be many), autorefs maps this alias path to its URL (it should not use the canonical path, because the alias is the public one, that's what the user wants)
now when autorefs fixes [Class][pkg.base.Class] (which happens, for example, because of Python annotations that are resolved using their actual scope, which is where they are defined, not imported), it falls back as mentioned above
the corresponding handler collects pkg.base.Class, and return its canonical path: pkg.base.Class
oops, that's the same path as before, autorefs is sad

So, what we need here is one of two things, or both:

the fallback method returns multiple identifiers that autorefs can test

# get_item_url method in autorefs plugin
...
    if fallback:
        new_identifiers = fallback(identifier)
        for new_identifier in new_identifiers:
            with contextlib.suppress(KeyError):
                return self.get_item_url(new_identifier, from_url)
        raise
...

# fallback method in mkdocstrings Handlers
for handler in self._handlers.values():
    try:
        anchors = handler.renderer.get_anchors(handler.collector.collect(identifier, {}))
    except CollectionError:
        continue
    if anchors:
        return anchors
return []

autorefs somehow updates its URL map with alias->canonical and canonical->alias as soon as possible. For example, when we render a canonical object, update the URL map with all the alias paths mapping to the canonical path (depends on the ability of the collector/collected data). When we render an alias, update the URL map with the canonical path mapping to the alias path. Of course if you render the same object, alias or canonical, multiple times, the last iteration wins as it will override mappings in the URL map. But we can consider it an invalid use-case, or a limited one.

The option 2 would also prevent redundant, costly fallbacks. Maybe this can already be improved by updating the URL map when finding a new valid anchor/identifier.

# get_item_url method in autorefs plugin
...
    if fallback:
        new_identifiers = fallback(identifier)
        for new_identifier in new_identifiers:
            with contextlib.suppress(KeyError):
                url = self.get_item_url(new_identifier, from_url)
                self._url_map[identifier] = url  # update the map to avoid doing all this again
                return url
        raise
...

Maybe we could call the renderers get_anchor[s] method early, in mkdocstrings.extension.AutodocProcessor.run and mkdocstrings.extension.AutodocProcessor._process_block, to register more anchors into autorefs.

# lets say _process_block also returns the object (data)
html, handler, data = self._process_block(identifier, block, heading_level)
...
for anchor in handler.renderer.get_anchors(data):
    self._autorefs.register_anchor(page, anchor)

oprypin commented 3 years ago

Oops I thought this was a pull request 😅 But I'll definitely check this out

pawamoy commented 3 years ago

Thanks! I'll experiment anyway, and send a PR if I get something elegant :slightly_smiling_face:

oprypin commented 3 years ago

we render Class with one of its alias path (there could be many), autorefs maps this alias path to its URL (it should not use the canonical path, because the alias is the public one, that's what the user wants)

Hmm I've certainly never planned for this case, I thought everything should always have unambiguous canonical path. That is, perhaps I straight up disagree with "it should not use the canonical path". If the handler wants to do shenanigans with "canonical aliases", perhaps it should somehow consistently report one chosen path.

the corresponding handler collects pkg.base.Class, and return its canonical path: pkg.base.Class

Well no, the returned canonical path should be pkg.Class, that's what the user wants.

That's all I have to say for now and probably won't actually be able to look into this in detail. I don't know if this is useful, but maybe it'll lead you to some interesting thought.

pawamoy commented 3 years ago

Sure, your input is always welcomed :slightly_smiling_face:

That is, perhaps I straight up disagree with "it should not use the canonical path"

I thought about it, and yes, perhaps it could use both, which corresponds to the proposed solutions, but not the current situation.

Well no, the returned canonical path should be pkg.Class, that's what the user wants.

True as well, but considering an object can have multiple aliases, the handler has no way of knowing which one of the aliases the object is registered with.

Of course I've very biased toward Python, which certainly does not mirror other languages concepts. But I guess populating the URL map with more data would not be harmful anyway.

oprypin commented 3 years ago

True as well, but considering an object can have multiple aliases, the handler has no way of knowing which one of the aliases the object is registered with.

I'm not sure if this is fully true.

The handler could perhaps memorize the first identifier it was collected through and use that henceforth.

But that would probably be considered more hacky than what you're suggesting..

pawamoy commented 3 years ago

Yeah there are multiple options here!

mkdocstrings / autorefs

Allow fallback to return multiple identifiers to try #11