latosha-maltba commented 3 years ago

I'd like to be able to define custom link targets for domains.

I'm doing literate programming in the spirit of Donald E. Knuth, e.g.

MyClass
=======

MyClass does foo and has an attribute ``bar``

.. code-block:: python

   class MyClass:
      bar = "foo"

[...] some text later [...]

The behaviour of :py:class:`MyClass` is defined by the value of :py:attr:`MyClass.bar`.

Since bar is not defined via the directive .. py:method:: MyClass.bar, the above :py:attr: does not produce a link since the target cannot be found. I could add the directive right before the code-block but that would produce a "MyClass.bar" string in the output which a oddly misplaced as the definition is right in the code-block.

Thus, I'd like to have a new directive or new option to existing directives, which produces a target (and an index entry) but not any output, e.g.


.. py:class:: MyClass
   :tagetandindexonly:

MyClass
=======

MyClass does foo and has an attribute ``bar``

.. py:attr-target:: MyClass.foo
.. code-block:: python

   class MyClass:
      bar = "foo"

[...] some text later [...]

The behaviour of :py:class:`MyClass` is defined by the value of :py:attr:`MyClass.bar`.

Here :py:class:... and :py:attr:... would find and link to the corresponding targets.

The requested behaviour is sort of the opposite of what :noindex: currently provides, instead of producing output without target, produce a target but no output.

Some thoughts

This feature needs support per domain as the implementation of the reference/targets is completely handed to the derived domain (Python)class.
Consequently, custom domains are likely to not support this feature out of the box but they might join once this hits a Sphinx release

Implementation idea 1

The class ObjectDescription already has a add_target_and_index() which is essentially all what we need.
Subclasses of ObjectDescription might have overridden run(), these might need additional adjustments
Rewrite run() to support an option :targetandindexonly: (a bad name but will do for this report) and only generate a "target" node. Call add_target_and_index() with that target node.
This would only need a change in the class ObjectDescription and some minor changes in those classes overriding run().

Open questions:

Was add_target_and_index() ever part of an public API? Cannot find anything in the Domain API [1]. More precisely, was there a guarantee that the node passed is a desc_signature?
Unclear whether the target is applied to a following heading (see Class example above)

Implementation idea 2

Duplicate each .. py:whatever:: directive to a .. py:whatever-target: directive which takes a signature as argument and produces the corresponding target and index.
This directive should have the same semantics as .. _reference-label:
More flexibility as we do not need to relay on add_target_and_index(), e.g. works with headers as target.
Would need more implementation in the individual domains as they must implement those new directives

Implementation idea 3

Maybe go a step further and provide a generic .. domaintarget:: directive with options which would produce for any domain and any signature a suitable target.

.. domaintarget::
   :domain: py
   :subdomain: attr
   :signature: bar
   :signature: bar2  (event multiple signature might be allowed)

whatever is hier is the target

This directive should have the same semantics as a .. _targetname: label.
This needs the domains to expose references and targets such that a generic .. domaintarget:: directive can be implemented, e.g.
- domain.subdomains() list of subdomains ("attr", "class", "foo", ...)
- domain.get_target("attr", "class.attrname") where the first argument is the "subdomain" get target reference
- domain.add_target("attr", "class.attrname") creates a target node
- domain.add_index("attr", "class.attrname") creates an index node
Avoids duplicating code as all the reference/target logic would be centralised in the domain and the domain classes can use this as well.
Needs refactoring of implementation of existing domain classes.

How to implement

I've never hacked on Sphinx or docutils but if someone is willing to mentor me, I would try to implement this feature.

[1] https://www.sphinx-doc.org/en/master/extdev/domainapi.html

jakobandersen commented 3 years ago

I think something like this would be a nice tool. Your nicely detailed report gave my an idea for how this perhaps could be implemented as a posttransform in a very generic way without having to touch the domains, except for adding the doEverythingAsNormalExceptRenderingButStillInsertAnchors-option. Modifying add_target_and_index seems brittle to me as domains may do wildly different things. E.g., the C++ and Pythons domains adds multiple anchors to each declaration in order to support old perma-links, but in different ways. However, the core part is that anchors are expressed by appending them to the docutils node attribute "ids" (e.g., see cpp or py), so I think the following could work:

Make domain directives recognize the doEverythingAsNormalExceptRenderingButStillInsertAnchors-option, and mark the resulting addnodes.desc node somehow.
Add a posttransform (that runs relatively early) that finds each of the marked nodes. For each of them, collect all "ids" in that node and recursively for each child node. Then replace the marked node with a docutils node that renders to nothing visible, except set its "ids" to the collected list.
Rename doEverythingAsNormalExceptRenderingButStillInsertAnchors to something more reasonable :-).

In your example you could then do:

MyClass
=======

MyClass does foo and has an attribute :py:attr:`bar`:

.. py:class:: MyClass
   :doEverythingAsNormalExceptRenderingButStillInsertAnchors:

   Nothing of of this text gets rendered due to the option.

   .. py:class:: NestedClass

      .. py::method:: doStuff()

   .. py::attribute:: bar

      Still nothing of this is rendered, as long as we are in MyClass.

.. code-block:: python

   class MyClass:
      bar = "foo"

[...] some text later [...]

The behaviour of :py:class:`MyClass` is defined by the value of :py:attr:`MyClass.bar`.

In that case you declare everything as normal, and all references MyClass, MyClass.NestedClass, MyClass.NestedClass, MyClass.NestedClass.doStuff, and MyClass.bar resolve to the same position where .. py:class:: MyClass would have been.

tk0miya commented 3 years ago

In python domain's case, you can use PythonDomain.note_object() to make a standard node having node_id referreable as a python object. https://www.sphinx-doc.org/en/master/extdev/domainapi.html#sphinx.domains.python.PythonDomain.note_object

For example, this :py:func: reference will work fine with the following code:

.. _label:

.. code-block::

   def myfunc():
       return True

:py:func:`myfunc`

def on_doctree_read(app, doctree):
    # register the code-block labeled as "label" as a python function named "myfunc".
    domain = app.env.get_domain('py')
    domain.note_object('myfunc', 'func', 'label')

def setup(app):
    app.connect('doctree-read', on_doctree_read)

In this short example, I used "doctree-read" event to implement quickly. But this is just an example. I think this helps you to implement idea 2.

tk0miya commented 3 years ago

Oops. I have to answer to your questions.

Was add_target_and_index() ever part of an public API? Cannot find anything in the Domain API [1]. More precisely, was there a guarantee that the node passed is a desc_signature?

It should be a public API. So it's a mistake not to be documented. I'll add it soon.

Unclear whether the target is applied to a following heading (see Class example above)

No. The goal of the directives that inherits ObjectDescription is generating a description of the target object as the name suggests (That is a desc node that jakobandersen described). So the generated target and index should be applied to the description.

latosha-maltba commented 3 years ago

@jakobandersen your idea looks similar to implementation idea 1 except that directly extracting the ids in run() you would do it in a post transform. Are there any advantages of a postponing the work to a post transform than doing it directly in ObjectDescription.run()?

@tk0miya PythonDomain.note_object() looks like the add_target of implementation idea 3 but for the Python domain only. Would there be interest to promote it to Domain, e.g. with signature Domain.note_object(objtype: str, sig: str, node_id: str, [maybe alias, ...]) which would internally dispatch that call to a new function ObjectDescription.note_object(self, name: T, sig: str, node_id: str) with name being whatever the handle_signature() method returns. This would be similar to how add_target_and_index() is treated.

Based on those function we could implement a rather universal directive similar to those in idea 3.

jakobandersen commented 3 years ago

@jakobandersen your idea looks similar to implementation idea 1 except that directly extracting the ids in run() you would do it in a post transform. Are there any advantages of a postponing the work to a post transform than doing it directly in ObjectDescription.run()?

Indeed, it is close to idea 1. You need to be very careful with run. You need to perform tasks after it has run so the nested declarations will have been processed already and you "just" need to perform the id-scraping from the children and this node, and then return a new invisible node. However, this means that you need to hook into every single object description type of every single domain. And those I would consider implementation details of each domain. How I understand the API of ObjectDescription is that the domain that uses it can assume full control, not that others are messing with it at the same time, so you basically need to implement this on a per-domain basis. If you implement it as a post-transform, then you are sure that no other code runs at the same time, and you are relatively sure when it runs in relation to everything else. In particular, you are sure parsing is done.

tk0miya commented 3 years ago

Would there be interest to promote it to Domain

It's difficult. Domains can support many kinds of data at same time. So it's difficult to register it through single method. For example, python domain supports "objects" and "modules". So it provides note_object() and note_module(). In addition, the python modules have some additional attributes (ex. synopsis, platform and so on).

jakobandersen commented 3 years ago

As proof of concept I have created #9671 with the post-transform idea.

latosha-maltba commented 3 years ago

@jakobandersen your idea looks similar to implementation idea 1 except that directly extracting the ids in run() you would do it in a post transform. Are there any advantages of a postponing the work to a post transform than doing it directly in ObjectDescription.run()?

Indeed, it is close to idea 1. You need to be very careful with run. You need to perform tasks after it has run so the nested declarations will have been processed already and you "just" need to perform the id-scraping from the children and this node, and then return a new invisible node. However, this means that you need to hook into every single object description type of every single domain. And those I would consider implementation details of each domain. How I understand the API of ObjectDescription is that the domain that uses it can assume full control, not that others are messing with it at the same time, so you basically need to implement this on a per-domain basis. If you implement it as a post-transform, then you are sure that no other code runs at the same time, and you are relatively sure when it runs in relation to everything else. In particular, you are sure parsing is done.

Now I understand your concerns better. Also thank you for your proof of concept with the post transform. Before reporting this issue I had another hack (subclassing all directives and removing all children instead of replacing the node). Yours is much smarter.

I toyed with it a little but noticed a tiny flaw: the targets are placed as span before the target (that's not the bad part) but are also considered to belong logically to the "previous part" (targets logically belong to the "next part"). I guess that is because it previously was a description-object and not a target.

Here is an example:


Introduction
============

some text

.. py:func:: myFunc
   :hidden:

myFunc is an awesome guy
========================

more text

The generated target is in the (HTML-element) section belonging to "Introduction" instead of being in the section of "myFunc is an awesome guy".

I took your draft and ported it into the run() method, you can see the result in #9675. Since you already did all the hard work, that was pretty easy. So far it worked out and the generated nodes belong to the "next part" as they (in my opinion) should. I also tested nested classes and so far everyting seems to work.

(Remark: changing the node type from inline to target or raising the Transform priority to 10000 does not help on my system with the PostTransform approach on my system.)

Note to self: maybe rename option :hidden: to :targetonly: (but note this still generates an index entry, supply :noindex: if that is not wanted)

latosha-maltba commented 3 years ago

Would there be interest to promote it to Domain

It's difficult. Domains can support many kinds of data at same time. So it's difficult to register it through single method. For example, python domain supports "objects" and "modules". So it provides note_object() and note_module(). In addition, the python modules have some additional attributes (ex. synopsis, platform and so on).

The method takes an string as type, so "module"/"mod" would be valid for that. The additional arguments to the directive can be passed as keyword arguments. Currently I'm thinking of

.. domaintarget::
   :domain: py
   :subdomain: attr
   :signature: bar
   :signature: bar2  (event multiple signature might be allowed)
   :otherarg: blub
   :nextarg:

being translated to the two calls

# domain.note_object(subdomain/type: str, signature: str, node_id/target: str, *kwargs)
py_domain.note_object("attr", "bar", <node_id>, otherarg=blub, nextarg=True)
py_domain.note_object("attr", "bar2", <node_id>, otherarg=blub, nextarg=True)

The domain knows that a module (or whatever it is) has keyword arguments and can process them accordingly (or throw an error if they mismatch or are missing). While this being more universal solution than adding a :hidden: tag (it allows to the user to add arbitrary references to any domain), it might be more clunky from a users perspective. Compare:

.. py:class:: MyClass
   :hidden:

.. domaintarget::
   :domain: py
   :subdomain: class
   :signature: MyClass

This can be made more ergonomic if domain and subdomain or combined into arguments and signatures are on their own line (assumes that a space is forbidden in the domain name which AFAIK it is):

.. domaintarget:: py class
   MyClass

or even

.. domaintarget:: py class MyClass

sphinx-doc / sphinx

Allow creating custom targets in a domain #9662

Some thoughts

Implementation idea 1

Implementation idea 2

Implementation idea 3

How to implement