holoviz / panel

Panel: The powerful data exploration & web app framework for Python
https://panel.holoviz.org
BSD 3-Clause "New" or "Revised" License
4.68k stars 509 forks source link

Website redirects: Add Sphinx extension Rediraffe to enable doc writers to add redirects, so old doc will redirect to new doc #6418

Open Coderambling opened 7 months ago

Coderambling commented 7 months ago

Problem: Currently no redirects can be added when documentation files /url's change. Created this PR at the request of @maximlt

Describe the solution you'd like

See Discord discussion here

@maximlt suggested using the Sphinx extension Rediraffe for this.

This will enable doc writers to use the extension to add redirect information to a new doc, so old doc will redirect to new doc.

Rediraffe can also generate auto-redirects for renamed files in the repo, although NOT for deleted files.

The autoredirect rediraffewritediff will have to be run regularly to look for renamed files and add them to the redirect file. So should it be run during build of the site?

This will help reduce broken pages.

This does not address showing a Not found page if a user lands on a non-existing page, for example because of a typo

Describe alternatives you've considered

Have not considered alternatives.

Additional context

See Discord discussion here

Raised this issue in Discord, created PR at request of @maximlt

To be addressed

Showing a not found page with search bar. Creation of redirects when files are deleted.

maximlt commented 7 months ago

Thanks for opening this issue (a Pull Request, aka PR, is a suggestion to change the content of the repository, look there to see the recent PRs) @Coderambling.

The autoredirect rediraffewritediff will have to be run regularly to look for renamed files and add them to the redirect file. So should it be run during build of the site?

I've never used that feature of sphinx-rediraffe, I'm not sure how well it works and what's the best way to integrate it in our workflow and CI.

I'd suggest starting simple:

  1. Adding sphinx-rediraffe as a dependency of the doc build in setup.py and as an extension in conf.py
  2. Using it to fix one of the identified broken links, to redirect towards a relevant page.

Once this is in place, it'll be much easier for doc writers to copy that pattern and add their own redirects when need be.

Coderambling commented 7 months ago

Hi @MarcSkovMadsen . Would it be better to submit things like this as a Feature Request next time, instead of as a PR?

@maximlt suggested using sphinx-rediraffe as a solution. I have never used it myself.

Agree with your approach. Makes sense. Obviously also test this to make sure it doesn't screw up all the links or similar issues, but I guess that is part of reviewing a PR that contains code?

I do not have sufficient knowledge of Sphinx, the document system or the build system to propose actual code changes that will achieve the above.

So to implement this it needs to be assigned to someone that with the appropriate coding skills etc.

If there is a standard doc template with guidelines for writers I would suggesting adding to that with something like:

-If this doc replaces an existing doc, please add a redirect by placing the current url here, and the new url here.

Coderambling commented 6 months ago

Noticed a couple of links (on external sites) that do not work anymore like: https://panel.holoviz.org/user_guide/APIs.html .

Looks https://panel.holoviz.org/user_guide/ does not exist anymore (does not return a page), and therefore there could be a bunch of broken links out there that start with /user_guide ?

Proposed solution: Add a 301 that redirects /user_guide and /user_guide/* to the existing user guide main link, or at least to the Panel homepage. Can Rediraffe accomodate such a redirect, or is a different solution needed for these types of redirect?

Coderambling commented 6 months ago

The website has about 50-60 broken links right now, not too bad.

Should I re-check after the next update of the website when all the 1.4 docs have been added?

maximlt commented 6 months ago

Could you share a list of them if you have it handy? I'm also interested in how you found them.

Coderambling commented 6 months ago

Yes of course! Will get to it in the weekend ok?

maximlt commented 6 months ago

Yep no rush!

Coderambling commented 6 months ago

Hey @maximlt . I have attached a spreadsheet list of the 66 broken links, with an explanation. About 16 of them will be fixed with a PR @philippjfr has made. I can re-run the analysis at any time, for example after the site is renewed when 1.4 is published.

Would be happy to elaborate, as there is a broader context to this than just broken links. For example, the current site also does not have a sitemap.xml document. There really should be one. Panel link issues. Results of crawling panel.holoviz.org..xlsx

philippjfr commented 6 months ago

Very helpful, thanks!

maximlt commented 6 months ago

Ahh I thought you meant broken links of the website, i.e. pages that were moved/removed without setting up a redirect. That's not what your table contains, still, it's useful, thanks.

Coderambling commented 6 months ago

Sure! I could set up a weekly / monthly automated run, that logs the results in a Google Sheets, and make it shareable if that helps. Let me know.

Coderambling commented 6 months ago

Hey @maximlt . No, the table does actually also contain those internal ones as well if you filter it! See explanation below.

Filtered list:

https://panel.holoviz.org/user_guide/Pipelines.html https://panel.holoviz.org/user_guide/Templates.html https://panel.holoviz.org/_static/images/sazure_deployment.png https://panel.holoviz.org/user_guide/Customization.html https://panel.holoviz.org/user_guide/Server_Configuration.html

So what you want is a list of the broken links of the website, i.e. pages that were moved/removed without setting up a redirect.

This can be done with a simple filter in the spreadsheet, to show only the subset of internal removed pages without a redirect.

In column D it shows all the pages that have broken links in their content. But Column B shows the url's of these broken links. Therefore it also contain a list of internal panel.holoviz removed pages url's.

Simply filter column B, to show only the links that contain the text "panel.holoviz.org". This then shows the subset of internal panel.holoviz.org missing pages.

You will see this results in a list of 5 missing panel.holoviz.org pages (see below), with Error Code 404. If you click them you will see that they indeed don't exist on the site.

Filtered list:

https://panel.holoviz.org/user_guide/Pipelines.html https://panel.holoviz.org/user_guide/Templates.html https://panel.holoviz.org/_static/images/sazure_deployment.png https://panel.holoviz.org/user_guide/Customization.html https://panel.holoviz.org/user_guide/Server_Configuration.html

Number 3 in the list is caused by a simple typo: change "sazure" in the link to "azure", en the png. image appears.

The other ones probably need to be fixed with a 301. That's the easier fix. Or, all references to those links need to be removed from all the internal pages in Column D (more work).

And there should also be a "catch-all" that redirects every 404 page to a page that says "Sorry, but here is our internal search bar" or something like that. This can usually be done at he web server level (let me know if you want to know how.)

So from this analysis of broken links, there are only 5 caused by internal pages missing, so caused by internal website issues.

The other 61 broken links all lead to external domains, so are due to external website url's changing (or maybe sometimes due to typo's when those url's were incorrectly entered in the content of a panel.holoviz website page.

The same principle applies if you filter for awesome-panel.org, which will show only those missing pages that are: -on @MarcSkovMadsen site -AND are mentioned on the panel.holoviz website as hyperlinks.

Etc. etc. for other external domains.

This list does not include an "isolated" internal pages, because do not cause a broken links, simply because they are not mentioned anywhere with any hyperlink on the website. These are only reachable via an internal or external search engine or something, but that is less common.

Separately, regarding pages that do have a 301 redirect: a link A on a page, that returns a 301, will redirect the user to link B.

That is fine, but ideally that link A should at some point be replaced in the page source by link B, otherwise more and more redirects are built up on the site. It's a secondary consideration, but still. A script / filter can catch and make a table of those cases as well.

Coderambling commented 6 months ago

I think I remember seeing a Github PR / Issue sometime ago, that mentioned migrating away from Google Analytics to another tool for the Panel (or Param?) website. Can you point me to that Github issue and / or tell me what the new tooling is? That tooling might possibly help with the above.

Coderambling commented 6 months ago

Link to fixed issue regarding some external broken links https://github.com/holoviz/panel/issues/6463

Coderambling commented 6 months ago

Had quick look at missing links for param.holoviz.org:

696 urls in total, only 2 broken links in total (internal + external).

404 Not Found | https://param.holoviz.org/assets/param_help.png | mentioned in page: https://param.holoviz.org/user_guide/Parameters.html

404 Not Found | https://param.holoviz.org/reference.html | mentioned in page: https://param.holoviz.org/user_guide/Parameter_Types.html

Coderambling commented 6 months ago

Missing links for holoviews.org:

4 in total, all pointing to external missing urls:

404 Not Found | https://github.com/pyviz/aholoviews/pull/3435 | mentioned in https://holoviews.org/releases.html 404 Not Found | http://scitools.org.uk/iris/docs/v1.9.2/index.html |mentioned in https://holoviews.org/releases.html 404 Not Found | https://holoviews.org/(https://github.com/pyviz/holoviews/pull/3364 | mentioned in https://holoviews.org/releases.html 404 Not Found | https://holoviews.org/(https://github.com/pyviz/holoviews/pull/3367 | mentioned in https://holoviews.org/releases.html

So basically the page https://holoviews.org/releases.html contains 4 broken links.

Coderambling commented 6 months ago

Holoviz.org has 24 broken links in total, 23 external, 1 internal: https://holoviz.org/tutorial/13_Deploying_Bokeh_Apps.html

Example of an external broken link:

404 https://raw.githubusercontent.com/holoviz/holoviz/main/tutorial/05_Interactive_Pipelines.ipynb mentioned in:

https://holoviz.org/tutorial/Interactive_Pipelines.html

If you go to the Pipelines Tutorial page there is a section on the right-hand side that says: "Right click to download and run locally. Clicking this link will fail, because it leads to https://raw.githubusercontent.com/holoviz/holoviz/main/tutorial/05_Interactive_Pipelines.ipynb . This url doesn't exist.

So that means users can't download the .ipynb from that page. Same problem for about 10 of the other tutorials /excercise pages on holoviz.org .

Coderambling commented 4 months ago

Had a closer look: the /examples part is missing from the URL:

https://raw.githubusercontent.com/holoviz/holoviz/main/tutorial/05_Interactive_Pipelines.ipynb

should be:

https://raw.githubusercontent.com/holoviz/holoviz/main/examples/tutorial/05_Interactive_Pipelines.ipynb

See text below in the page: link at the very bottom is correct, the one immediately above is not.

https://holoviz.org/tutorial/Interactive_Pipelines.html

Conclusion

...

What if you want to collect these pieces and put them together into a standalone app or dashboard? If so, then the next tutorial will show you how to do so!

This web page was generated from a Jupyter notebook and not all interactivity will work on this website. Right click to download and run locally for full Python-backed interactivity.

Right click to download this notebook from GitHub.

Coderambling commented 4 months ago

Made PR to add redirects in conf.py : https://github.com/holoviz/panel/pull/6418