Style guide suggestion: Avoid duplicate links

JelleZijlstra commented 2 years ago

https://github.com/python/cpython/pull/92199#discussion_r863506149:

It is not related to this PR, but if some item is mentioned multiple times in one paragrafph, I prefer to keep link only in one of them (usually at the first occurrence). :class:`!OpenerDirector` or ``OpenerDirector``. Too many duplicated links do not make the text more readable.

I agree with @serhiy-storchaka; this would be good to mention in the docs style guide.

AA-Turner commented 2 years ago

+1.

I'd put this in the same conceptual group as introducing initalisms & acronymns once and then using the shorter form.

A

hugovk commented 2 years ago

What is the scope for avoiding repeated links?

For long pages, it could be useful to use links in different sections.

The nature of webpages is you don't read the whole page, but skip to what is relevant, and may even arrive part way down via an anchor link.

AlexWaygood commented 2 years ago

I'd agree with @JelleZijlstra that, as a general rule, you should link to the same thing only once per paragraph. If the paragraph is long enough that you start wondering if you need to add another link, it's probably time to break up the paragraph.

I think it's fine to link to the same thing as soon as you start a new paragraph, in most cases.

AA-Turner commented 2 years ago

What is the scope for avoiding repeated links?

Per paragraph I'd say, as with Alex / Serhiy / Jelle. There are edge cases (enumerations, single sentence paragraphs followed by a full paragraph), but any style guide is incomplete.

A

CAM-Gerlach commented 2 years ago

Like for AmE/BrE spelling, Wikipedia's MoS specifies not repeating wikilinks in close proximity. IIRC in that case it is per-section, but once per paragraph-ish seems fine, particularly for reference docs, with room for context-appropriate discretion (very short paragraphs, etc.).

encukou commented 2 years ago

IMO, paragraphs are a good rule of thumb for reference docs. It would be more precise to say you should add a link if it's likely that the reader doesn't remember seeing see the previous link – it was too far back, or they skipped the previous paragraph, or jumped to the current paragraph via a hyperlink. But with good paragraph structure, perhaps that's the same thing.

CAM-Gerlach commented 2 years ago

I've thought a fair bit more about this and put it into practice various places; I typically do so per lowest-level section (or equivalent unit, e.g. class, function or method in API docs) since the reader may have jumped directly to the section via a link, or otherwise only consulting that specific section. In particular, readers of reference docs may only be reading at a high level of granularity and each unit (function, method, etc) should stand on its own, and readers of how-to guides are likely only reading the specific section covering their specific task, same with an individual explanation section, etc.

So long as one makes proper use of Sphinx's very powerful linking and cross-referencing capabilities (even between different Sphinx sites with intersphinx), it is very cheap to link/reference things, so the main constraint becomes oversaturating the reader with duplicate links as long as they would be expected to read both (e.g. same paragraph/section).

gvanrossum commented 2 years ago

It’s not just duplicates. Sometimes just the sheer number of links in a paragraph can be distracting. (Making the links stand out less does not really answer this, it’s no use if there are links but you can’t see them.)

ezio-melotti commented 2 years ago

This recently came up in https://github.com/python/cpython/pull/94636#discussion_r933885060 too.

Having a link per paragraph sounds good to me, with some wiggle room (e.g. sometimes it might be enough to have the link once in a 2-paragraphs-long function documentation).
Links to the module/class/method/function itself are generally not very useful, so those could be omitted in all cases.

However, if additional occurrences are not linked, they will render differently, potentially being more distracting. For example:

with ``...`` and it will look like:

The type system of the sqlite3 module is extensible in two ways: [...] you can let the sqlite3 module convert SQLite types to Python types via converters.
with no markup it will look like:

The type system of the sqlite3 module is extensible in two ways: [...] you can let the sqlite3 module convert SQLite types to Python types via converters.

CAM-Gerlach commented 2 years ago

This recently came up in https://github.com/python/cpython/pull/94636#discussion_r933885060 too.

Just to clarify for others, more or less these guidelines (linking only on first usage within a unit, and not linking something we're inside of already) is what @erlend-aasland (and by extension, myself) have been following on the sqlite3 docs Diataxis-inspired revamp, and they seem to be working well.

If we're codifying this in the style guide, I would suggest making the general rule "one link per least readible unit", or whatever we want to call the minimum unit a reader would be expected to read given the Diataxis type. Generally speaking, this means the lowest-level addressable block, e.g. the lowest-level section in a How-To Guide, or per-function parameter in an API Reference.

This guideline is more flexible, generally applicable and tailored to the Diataxis type of the documentation while helping guide authors on how to judge this—plus, a lot of places don't have actual "paragraphs" at all. Furthermore, a more rigid application of the fixed "paragraph rule" could lead to overly high link density in some places (successive dependent paragraphs in the same lowest-level How-To section), while missing links in some places they are crucial (each parameter in an API Reference).

However, if additional occurrences are not linked, they will render differently, potentially being more distracting. For example:

I assume you're referring specifically to :py:-domain roles that are linked to their respective objects, given it is fairly well accepted to only hyperlink items in normal prose text within a certain unit, and unlike the former, is both more cognitively and mechanically expensive for authors.

I see two possibilities, both of them with benefits and tradeoffs:

Ref on first usage within a unit: + Reduces link density, and allows spending link budget where its more valuable than just repeating the same link + Makes it clearer to readers what is a new linked term within a unit, and which they've already seen + Only formatting change (under our current theme) is the link, which follows convention for prose text - May look inconsistent to some readers - May modestly increase cognitive load for the author to determine and remember when to use which
Ref on every usage: + Individual usages look consistent with each other + Easier for authors to remember and follow - Not consistent with practice in prose text - May lead to excessive link density for readers, or come at the expense of much more useful links

This isn't a super-strong preference, but while conceptually I like the idea of being consistent on every usage, given the practical tradeoffs for readers I favor linking first usage, mostly to reduce link density for readers and ensure link budget is spent on more reader-useful links, and since the formatting difference is just of the link (which provides meaningful information to readers) and consistent with how prose links are treated.

erlend-aasland commented 2 years ago

For me, this boils down to is this reference going to make life easier/better for the user[^1]?

IMO, I do not see how duplicate links and a high link density can possibly improve life for the user; I seems to me an anti-pattern. We should not add refs just because we can (i.e. there is something to link to => I need to link to it). We should only add refs if they improve the docs for the end user. IMO, reducing the link density and making sure that there is only unique refs/links within a paragraph is a great improvement for the end user. Consistency? Yes, making sure that there is only unique links within a paragraph is a schema; following a schema leads to consistency.

IMO, in the documentation for method x, we should not link to the documentation for method x[^2]; ditto for functions, classes, and even modules[^3].

[^1]: The same question can be applied to everything we do; is this feature going to make life better for the user? Is breaking this API going to make life better for the user? Etc. [^2]: Again, is the link to method x when you are reading about method x going to improve the life of the end user? I don't think so. Actually, I think it might even be confusing or even irritating. [^3]: How does linking to the sqlite3 module from within the sqlite3 module docs help the user? My answer: it does'tn; it only leads to the user clicking the link, losing the flow of what they were reading, realising that the link only linked to the top of the very page they were reading, then navigating back to whatever they were reading. Was that forward/backward dance needed for the user? No, it was totally unneeded. Did it improve the docs experience? No. Did it further enlighten the user? No. Did it cause frustration? Possibly. Did it cause irritation? Possibly.

CAM-Gerlach commented 2 years ago

Just to note, my argument for making the guidance "first use in atomic unit" follows the same basis as yours, and just takes it further—it is focused specifically on ensuring readers see one and only one link to a given target within each "atomic unit" of the docs (e.g. section, admonition, parameter description, etc), which:

better serves readers by aligning with the scope in which reader is highly likely to have seen the link, or at the very expect to find it (e.g. inside the lowest-level section they are presumed to have read in a guide, inside an admonition that they may be reading on its own, or an individual function parameter rather than an unrelated parameter they may not have read)
reduces link density to the minimum required by not repeating links any more than necessary (since paragraphs inside lowest-level guide sections generally presume having read the previous paragraphs anyway, there's no need to repeat a link in each one)
more consistently encompasses the different syntax and semantics used through docs and in different Diataxis categories, beyond just paragraphs, allowing for a more generally applicable rule

ezio-melotti commented 2 years ago

A possible counterargument is that the markup we are adding (e.g. :func:`...`) is there to add semantics to the element -- the fact that it also generates a link might be seen as secondary. By removing the roles and using ``...``, we lose semantic information, even though practically speaking I'm not sure it really matters to anyone.

I wonder if, instead, it would be possible to use roles consistently (thus reducing the cognitive load of the writer, and making updates easier) and have Sphinx automatically add links only to the first occurrence within an atomic unit. This will also save us for rewriting all repeated references in all the documents.

erlend-aasland commented 2 years ago

I wonder if, instead, it would be possible to use roles consistently (thus reducing the cognitive load of the writer, and making updates easier) and have Sphinx automatically add links only to the first occurrence within an atomic unit. This will also save us for rewriting all repeated references in all the documents.

If that's possible, that would be wonderful!

CAM-Gerlach commented 2 years ago

Actually, from reading the devguide section on inline markup for python/devguide#1000 , I've come up with a better idea. Per said section on cross referencing roles:

If you prefix the content with !, no reference/hyperlink will be created.

Therefore, instead of doing ``...``, we can simply do role:`!...` and it won't link the text but otherwise assign it the same doctree node type, and therefore the same formatting (to note, it won't check them for correctness with -n, it seems, or elide any intersphinx prefix, as it doesn't attempt to actually resolve the reference at all—perhaps not ideal for catching typos, but conversely it means that its easy to add references that we know won't resolve just for formatting consistency, like we did in the What's New, without generating noisy Sphinx warnings).

This has a superset of the advantages of the ``...`` on non-first-usage approach, while also ensuring syntactic, formatting and semantic consistency with linked usage and only requiring a one-character change to implement on existing text, minimizing any impact of inconsistency on both authors and readers. Therefore, this seems to be obviously preferably over either previously proposed alternative.

Also, the same approach can be used for references to an object (module, class, function, etc) inside itself, with the same benefits and essentially no downsides relative to either.

Source:

* If you prefix the content with ``!``, no reference/hyperlink will be created.

  For example, :class:`sqlite3.Connection`
  vs. :class:`!sqlite3.Connection`.

Result:

I wonder if, instead, it would be possible to use roles consistently (thus reducing the cognitive load of the writer, and making updates easier) and have Sphinx automatically add links only to the first occurrence within an atomic unit. This will also save us for rewriting all repeated references in all the documents.

It quite likely is possible; we could e.g. write a Transform that iterates through xref doctree nodes and either sets them to non-resolving or subs in the appropriate subclass of xref nodes to not resolve. However, it would be quite non-trivial, mostly due to having to programmatically define at the doctree level exactly what constitutes an "atomic unit"—which again, would be theoretically possible (since there should be enough semantic information to fairly reliably conclude this) but it would require handling a fair number of edge and corner cases due to the number of different node types and nesting, and if it gets one wrong there's no real way for an author to use their judgement and override this.

Also, particularly if substantial time is not spent on optimization, it could be quite expensive since it requires iterating through a large number of low-level nested nodes in every single document, perhaps multiple times; given that doc build times (given the large size and even larger build matrix) are already a serious concern, even a few percent slowdown could have significant implications.

So, while it possibly could be explored, IMO given it only requires adding one extra character (and the far more time-intensive part is properly linking things to begin with, which this could of course not solve) it might be better to just use manual discretion here, at least for now.

serhiy-storchaka commented 2 years ago

At least once I intentionally made the second occurrence a link, not the first one. Also, sometimes the unit is larger than paragraph, so only human can make a decision.

CAM-Gerlach commented 2 years ago

At least once I intentionally made the second occurrence a link, not the first one.

Indeed, very recently I did the same for some carefully thought out reasons

Also, sometimes the unit is larger than paragraph, so only human can make a decision.

There is theoretically enough semantic information in the doctree to determine appropriate unit size (hardcoding it to paragraph nodes would be a nonstarter as it wouldn't work in a large number of common cases), but as you mention there's still enough special cases this wouldn't handle, and wouldn't provide an easy means to override without even more complexity, that I don't think that aspect is worth it given it is easy to do manually following the newly-suggested above approach.

ezio-melotti commented 2 years ago

Using :role:`!...` seems a nice compromise, so I think we should just recommend that. :role:`~...` could also be documented nearby as "Avoid repeated module/class names".

merwok commented 2 years ago

Can you combine the two modifiers?

ezio-melotti commented 2 years ago

* :meth:`Cursor.execute`
* :meth:`~Cursor.execute`
* :meth:`!Cursor.execute`
* :meth:`~!Cursor.execute`
* :meth:`!~Cursor.execute`

CAM-Gerlach commented 2 years ago

:role:`~...` could also be documented nearby as "Avoid repeated module/class names".

@ezio-melotti Its already the very next bullet point :)

Can you combine the two modifiers?

@merwok You could also just do :meth:`!execute`, which is simpler and less to type, but :meth:`~!Cursor.execute` is preferred from a semantic perspective, clearer in the source text, more consistent across usages and and will still work if you remove the !.

merwok commented 2 years ago

I know, that’s why I asked! Great that the combo works, thanks Ezio for checking 🙂

CAM-Gerlach commented 2 years ago

Sorry, just wanted to clarify :)

CAM-Gerlach commented 2 years ago

Actually, :meth:`~!Cursor.execute` only partially works, and accidentally at that, as it makes Sphinx try to resolve !object.extend, which fails, so only the extend part is displayed due to ~ and it is unlinked due to the resolution failure. However, unlike a true !, the name is still looked up and attempted to be resolved, and it still generates a -n broken reference warning, so presumably :meth:`!execute` is actually preferable.

This could be changed in Sphinx by a few different methods (overriding create_non_xref_node in PyXrefMixin to run parse_reftarget first and use that as the title text, in order to support !~ (probably the best approach I see), modifying parse_reftarget and doing a separate check for ! in PyXrefMixin to support ~!, and possibly others.

However, none of them are trivial to implement and test, and would only come with a yet-unreleased Sphinx version, so I'm not sure if it is worth it for a case that could just be handled by manually omitting the undesired parts of the dotted name.

@AA-Turner , thoughts?

hugovk commented 9 months ago

Tooling aside; I see a consensus here, would someone like to write it up for the https://devguide.python.org/documentation/style-guide/?

nedbat commented 7 months ago

I've made a pull request for the devguide: https://github.com/python/devguide/pull/1294

nedbat commented 7 months ago

In https://github.com/python/cpython/pull/117005#discussion_r1532502844, @AlexWaygood suggests using

:meth:`!__init__`

instead of

:meth:`~!object.__init__`

because it renders the same, but is less clutter in the .rst file. Semantically, it's a link-suppressed reference to a non-existent method, but the readers' experience will be the same. Thoughts?

nedbat commented 7 months ago

Thoughts?

Never mind: I am now seeing that this has been discussed above, and CI fails with ~!, so I will switch to the shorter form.

hugovk commented 4 months ago

I think we can close this now https://github.com/python/devguide/pull/1294 is merged, please re-open or comment if there's more to do.

python / docs-community

Style guide suggestion: Avoid duplicate links #52