Open chrisjsewell opened 7 months ago
Ah yes, the pending_xref
node. The issue with the attributes is that there were different people implementing different things at different places and I don't think we ever had a doc telling the exact specs of that node.
here the "check" treats refdomain as if it is optional, but then the msg creation treats it as if it is required π
It becomes "required" in the sense that node.get('refdomain', 'std')
returns 'std' or node['refdomain']
. But since you excluded the 'std' case, you get node['refdomain']
so it's fine.
here refsectname is treated as required, but if you actually search for refsectname in the code base, it is definitely not set in a lot of places:
The standard domain is always present, so the corresponding transformations and post-transformations as well. In particular, when you call env.resolve_references(largetree, ...)
, you also call the resolve_xref
which may either remove your pending_xref
nodes or build dangling references nodes that are still instances of pending_xref
nodes. You can see that there are not a lot of places where you explicitly create pending_xref
nodes after resolving (the resolve_xref
calling _resolving_ref
in the standard domain is an example) so I think it's the only place which, when after resolving everything, you still have the possibility of having such nodes (with the expected attribute).
Now, we should have added a better comment or be sure that the only remaining pending_xref
nodes are created by the standard domain for ref
.
As for your original question: yes and no. Yes for common attributes that are known to exist, but that's not always the case (like, refdomain
might not exist in some extensions) and you might not know at construction time the final information that is needed (for instance Domain.process_field_xref
might add additional information to the node when called).
I'm not familiar with this part of the code base, so my comments will be general rather than specific.
Sphinx uses 'stringly-typed' interfaces in a lot of places by using a dict-like interface for many objects. As you observed, often without default arguments or error handling. That doesn't imply it's a bug, more likely the author knows some invariant that mypy doesn't. The problem comes if you refactor and unknowingly violate the invariant... If we can refactor more code to be interrogable by mypy then refactoring can become more "fearless"
Where we're absolutely certain that a property must always exist on a class or instance of a class then I think it makes sense to create a subclass and set the property in the init method (or potentially as class variables where appropriate) and then access these properties directly rather than through the dict-like interface.
like,
refdomain
might not exist in some extensions
Well, that is one of the things I'm trying to work out for myst-parser π But I think this has to exist, since it is expected in a number of places:
sphinx/builders/latex/transforms.py:
563: if node['refdomain'] == 'math' and node['reftype'] in ('eq', 'numref'):
sphinx/domains/math.py:
32: node['refdomain'] = 'math'
sphinx/ext/intersphinx.py:
751: (node['refdomain'], typ, node['reftarget']))
sphinx/transforms/i18n.py:
308: case = node["refdomain"], node["reftype"]
313: node["refdomain"],
sphinx/transforms/post_transforms/__init__.py:
87: domain = self.env.domains[node['refdomain']]
210: (node['refdomain'], typ, target))
tests/test_extensions/test_ext_intersphinx.py:
32: node['refdomain'] = domain
If we can refactor more code to be interrogable by mypy then refactoring can become more "fearless"
exactly, amen to that π
I'd be happy to move out to property-based interface but I think the choice is mostly historical and docutils motivated. Alternatively, I could suggest having a mypy plugin for that instead because it could be integrated in the current framework without refactoring anything (or barely).
But I think this has to exist, since it is expected in a number of places:
It is expected but I'm not sure that it is necessarily expected at the beginning of the transformation chain. Transformations and post-transformations have different priorities and I'm pretty sure there are some that dynamically add information because their authors know some invariants (that we do not) to process it in the future.
Currently,
pending_xref
includes no information about what attributes it is expected to have: https://github.com/sphinx-doc/sphinx/blob/bfce4f54b0fc010b04f0d17ad8b9d47d8c279e1f/sphinx/addnodes.py#L477-L485There are definitely some attributes that are expected, i.e. they are accessed with
node['name']
and wouldKeyError
except if the attribute is not present, then there are ones that are maybe optional, i.e. they are accessed withnode.get('name')
I haven't checked if there are actually any open issue, but there definitely seems like there are discrepancies in the code base, here are some examples:
https://github.com/sphinx-doc/sphinx/blob/bfce4f54b0fc010b04f0d17ad8b9d47d8c279e1f/sphinx/transforms/post_transforms/__init__.py#L208-L210
here the "check" treats
refdomain
as if it is optional, but then themsg
creation treats it as if it is required πhttps://github.com/sphinx-doc/sphinx/blob/bfce4f54b0fc010b04f0d17ad8b9d47d8c279e1f/sphinx/builders/latex/__init__.py#L373-L375
here
refsectname
is treated as required, but if you actually search forrefsectname
in the code base, it is definitely not set in a lot of places:Am I missing something here? How is some of this stuff not excepting?
@danieleades do you have any thoughts on how to type
Node
subclasses, to have some kind of enforcing of available attributes. Ideally it would be enforced for instantiating the node, and for accessing attributes in an existing node, but I know this would be extremely difficult, if not impossible πAlso cc @picnixz may have an opinion?