Open npielawski opened 2 hours ago
Hey there, thanks for this!
Regarding the link for fetch
/ the section “Execution on a partial dataset”, see https://github.com/pola-rs/polars/pull/18033. My suggestion is that you share with the OP of that PR what you intended to do with head + collect
.
As for the broken links, please do submit the corrected links.
When you talk about “stale tags”, I assume you are talking about entries in the YAML file that are not referenced in code blocks, is that it?
When you talk about “stale tags”, I assume you are talking about entries in the YAML file that are not referenced in code blocks, is that it?
Yes exactly
Yes exactly
Ok, I see. To be honest with you, I am not 100% sure if those tags are relevant elsewhere, so if those links are all just working fine, I'd recommend we keep them for now.
Fixing the broken & used links seems more useful in the short term and since you were kind enough to share the scripts you used to check the links we can always go through the tags again later.
Description
I noticed a dead link in the user guide, so I made a script to probe all links in docs/source/_build/API_REFERENCE_LINKS.yml It turns out that there are 22 dead links (HTTP response != 200) in the user guide. Many links are stale and need to be updated, there are a few typos, too. The issue concerns both Python and Rust links.
As an example of broken link Expressions / Aggregation has a broken link if you click on
API Categorical
for the Python code example.I made another script to look at stale tags that are not referenced in the API, and there are 17 such instances. This is assuming that the links are only being used in
docs/
and that the line containscode_block
. I am double checking the positives manually to make sure there are no false positive.Finally, the
fetch
link (which gives a 404) doesn't have a API documentation page anymore, likely due to the function being deprecated. It would be best to rewrite the section in docs/source/user-guide/lazy/execution.md L52-79 and use head+collect instead (since this is what is recommended in the source code).If this issue is accepted, I can submit a PR and update the links (already did the work), I can start writing a new
Execution on a partial dataset
section as well. I am wondering if the stale tags should be removed at all (the links are all returning HTTP 200), and I am not 100% certain I won't break something by removing them.Here is the list of links:
Here is the list of unused tags:
The code to find all broken links (run in
docs/source/_build
):The code to find stale tags (run in
docs/source/_build
, usingripgrep
):Link
https://docs.pola.rs/user-guide/expressions/aggregation/