Open barneygale opened 1 week ago
Try not to break all the external links going into the pages. We don't want to invalidate all the references from blogs, tweets, stackoverflow answers, etc.
With regard to search engine results, I don't think we can or should engage in SEO. There is no promise that rearrangements will lead to being a top hit for a search.
Try not to break all the external links going into the pages. We don't want to invalidate all the references from blogs, tweets, stackoverflow answers, etc.
Presumably this is impossible, right @picnixz?
My suggestion for refactoring these large pages while mitigating the damage to existing deep links was to:
data_model/
in this case)The damage to existing deep links that can't (or won't) be changed is still a good reason to tread carefully, but never being able to split pages as they grow over time isn't a great situation either.
For more background on why we should preserve link integrity as much as we can, the World Wide Web Consortium has a decent page here on why "Cool URIs Don't Change": https://www.w3.org/Provider/Style/URI
Presumably this is impossible, right
Mmmh. It could be possible actually but this would require a custom Sphinx extension and custom redirection at the nginx / apache level where old URLs would redirect to new ones (the Sphinx extension will be used to extract the mapping). It's also a bit of a hacky solution but I don't have a better alternative (a pure Sphinx solution may not be possible because we don't want a dead link if an article cites something like https://docs.python.org/3/reference/datamodel.html#numbers-number
; auto-generated doc using :class:`numbers.Number`
would be fine since the intersphinx inventory would be updated but raw links won't).
If you want to improve SEO, isn't there a way to indicate in an HTML document that this or that text is more important than something else (e.g., with some aria label or whatever HTML feature we may have)?
More generally, if you want to split the HTML, it's more of a server-side issue rather than a Sphinx issue (where the server would redirect to the appropriate page). So some redirect rules will need to be rewritten (and I don't know how much it could slow down the entire docs website).
Alyssa's suggestion on having a page serving as a hub is possible but it will be a bit ugly (because we still need to make all possible anchors available on that page so that users can re-click on them to have the expanded content).
Alyssa's suggestion on having a page serving as a hub is possible but it will be a bit ugly (because we still need to make all possible anchors available on that page so that users can re-click on them to have the expanded content).
Could the Sphinx extension glue together several pages to form datamodel.html
? It would resemble the existing page (perhaps with a small amount of jankyness), but it would be an "orphan" page with no incoming links from the rest of the Python docs. At the top we could add a banner:
The Python data model documentation has been split into several chapters. This page combines those chapters into a single document; it exists solely to keep existing links working.
The original Py2-as-default -> Py3-as-default in https://peps.python.org/pep-0430/ was certainly all server-side redirect config. And yeah, I agree the orphaned navigation page isn't a good solution, it's just a better option than leaving people with either a 404 or an unanchored link to the start of a page with less inline content.
Unfortunately, web server rewrite rules can't help us here, as the anchor tag part is never sent to the server - it's handled by the browser after downloading the page. HTTP redirects don't help either, as they also operate at the page level.
It should be possible to do something clever with client side JavaScript: https://stackoverflow.com/questions/1305211/javascript-to-redirect-from-anchor-to-a-separate-page (and that could potentially be extended further to handle smaller cases like the deep links I recently broke by moving the Py_Main
C API docs to a different page in #78387).
Could the Sphinx extension glue together several pages to form datamodel.html
If you're worried about the length of datamodel.rst
, then you can do it natively using .. include::
directives.
Ah yes, I forgotten about the redirection using JS. I was confused because I actually thought about server-side rendering. Now using JS can be integrated in Sphinx directly (IIRC).
If you want to improve SEO, isn't there a way to indicate in an HTML document that this or that text is more important than something else (e.g., with some aria label or whatever HTML feature we may have)?
We've no way of knowing which of the 18k words (or 25k in https://github.com/python/cpython/issues/126052) is the important text that any given visitor is interested in. That's why more granular pages will help.
(We may want to break out a separate pre-requisite issue for this, but continuing here for now)
Summarising what a potential solution to allowing moving link targets between pages, or making other changes (like updating section headings) without breaking deep links to those anchors:
https://docs.python.org/dev/
as the reference docs for main
, and compare each new build to those. It might be sufficient to use the existing intersphinx inventory as the basis for comparison)This is still @picnixz's "custom Sphinx extension" idea, just with a better idea of what that extension would need to offer to enable docs refactoring without worrying about breaking existing deep links. If this existed, my orphaned navigation hub idea wouldn't be needed.
I like the idea of using the intersphinx data. Here's a script that uses sphobjinv to print links that have died in the 3.14 docs:
from sphobjinv.inventory import Inventory
def load(url):
inv = Inventory(url=url)
return {obj.uri_expanded for obj in inv.objects}
old_urls = load('https://docs.python.org/3.13/objects.inv')
new_urls = load('https://docs.python.org/3.14/objects.inv')
dead_urls = old_urls - new_urls
for url in sorted(dead_urls):
print(url)
A very basic solution might be to redirect users to search.html
, and supply the URL fragment as the search query. This would work OK for terms and python references, but not heading permalinks.
This one is weird: library/json.html#cmdoption-json.tool-indent. Nothing seems to have changed in the rst between 3.13 and 3.14 and this could be a Sphinx issue. I think we had an issue for that somewhere but I forgot. I'll need to investigate.
NVM, the program
was changed.
@nedbat Does the docs WG want to take a position with regard to docs stability versus refactoring into smaller chunks in hopes that SEO will be improved?
I think this should be motivated not just by SEO, but also by improving the usability of the docs. It's a very large file that covers a lot of ground, and the way it's organized isn't necessarily the best. That may be bad for SEO, but it's also not ideal for human readers.
Currently the file has not just a discussion of Python's general "data model", the way data is represented, but also detailed documentation about some precise types, such as code objects. That documentation might fit better at https://docs.python.org/3/library/types.html#types.CodeType, so the data model page can focus more on behavior of the core language. Similarly, the data model page has discussion of numbers.Number and similar classes, which feels a bit out of place, as those are library ABCs, not core parts of the language. On the other hand, memoryview
, a builtin, isn't mentioned as part of the "standard type hierarchy". Some of the file also duplicates the stdtypes page: compare https://docs.python.org/3/reference/datamodel.html#set-types and https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset.
I also agree we should avoid breaking links. If we want to be very strict in this, we could build some tooling that records e.g. all anchor targets in an old version of the docs and asserts they continue to work.
A few general considerations on splitting up a long page in the Language Reference (which this is). I'm speaking from my perspective and not for the entire @python/editorial-board. I would urge us to be more conservative with the Language Reference docs than the Library docs since it is the definition of the Python language.
Currently the file has not just a discussion of Python's general "data model", the way data is represented, but also detailed documentation about some precise types, such as code objects. That documentation might fit better at https://docs.python.org/3/library/types.html#types.CodeType, so the data model page can focus more on behavior of the core language.
@JelleZijlstra's example is in line with my thinking when it comes to Language Reference changes vs. Library Doc changes.
Users experience and discoverability are more important than SEO.
To be clear, UX and discoverability are the entire reason I care about SEO here!
To be clear, UX and discoverability are the entire reason I care about SEO here!
I understand your intent. To restate, if improvements to SEO impact negatively UX and discoverability, we should pass until the negatives are mitigated. As an aside, the exclamation point wasn't necessary in the earlier response.
Sorry!
I think the page is too long, and would improve both UX and SEO to be split up. It sounds like there is probably a way to reasonably preserve old links, though that still needs some investigation. It's a big job that should be done with care.
As there seems to be consensus that a technical improvement around preserving deep links is needed before we embark on any major layout changes, I filed that request as a docsbuild-scripts
issue: https://github.com/python/docs-community/issues/134 (even if using the technical solution ends up being a CPython change, creating that solution seemed more like a docs build question to me).
Another good first step is making a concrete proposal about how the page would be split up. I know from my own work on the devguide that it's easy to look through an existing document and be certain that it could be reshaped into something better. When you actually sit down to do the reshaping, difficulties arise, decisions have to be made, and so on. Does someone want to write a doc somewhere that shows how a split page would be structured?
My first impression is: split them by classes first. They are good on their own IMO. And each class can by regrouped by topic (e.g. strings, numerics, collections, etc). I can sketch a rough idea if you want (maybe by the end of the afternoon)
I'm not sure about the Data Model page, but @nedbat's question prompted me to add a draft split for the builtin types page in https://github.com/python/cpython/issues/126052#issuecomment-2447175975 (giving str
its own page would also mean we could finally move the details of the format string syntax out of the string
module docs).
Perhaps the most conservative first iteration after getting the linking resolved would be to split the doc where there are natural breaks: 3.1, 3.2, 3.3 and 3.4. This will keep familiarity initially, and it does not preclude us from further splitting classes and 3.2 in future iterations.
Documentation
The Data model document is very long, and as a result it basically never shows up in search engine results, because 90% of the page is considered irrelevant for any query like "python __hash__".
I suggest we split it up by top-level topic, e.g. we add a dedicated page for "Special method names".
See also #126052