processing / p5.js-website

New p5.js website!
http://p5js.org
MIT License
19 stars 89 forks source link

Programmatically found broken internal links #527

Open ff6347 opened 2 months ago

ff6347 commented 2 months ago

Most appropriate sections of the p5.js website?

Other (specify if possible)

What is your operating system?

Mac OS

Web browser and version

not related to browsers

Actual Behavior

Broken links generate a 404

Expected Behavior

Links should serve the linked content

Steps to reproduce

Since I saw some reports of broken links in the website I tried a programmatic approach using broken-link-checker.

How to:

Edit: To make it easier to generate an overview I created a script that outputs GitHub flavored markdown. https://github.com/ff6347/find-broken-links

npm install broken-link-checker
npx blc -roe https://p5js.org > report-blc-prod-live.txt

# The flags do the following
# --exclude-external, -e  Will not check external links.
# --ordered, -o           Maintain the order of links as they appear in their HTML document.
# --recursive, -r         Recursively scan ("crawl") the HTML document(s).

The output needs some cleaning using regular expressions afterwards.

Remove all reports that are okay:

^├───OK───.*?$\n

Then remove all reports of urls that have no broken links:

^Getting links from:.*?$\nFinished! \d{1,1000} links found. \d{1,1000} excluded. 0 broken.

Fix some reports where there is no new line between the report and the next.

look for:

broken\.\n(\w)

Replace with:

broken.\n\n$1

These three should do most of the grunt work. There are some false positive reports about images that I removed manually at the end.


Here is what I've found:

Low Hanging Fruits

There are some links in footer and reference that create 301 redirects which can be avoided through small tweaks.

These are the lines that need a trailing slash:

Generally these are neglect-able, but a HEAD request against them returns a 301 HTTP response. This could be fixed since it is a low hanging fruit.

301 redirects

There are many links in the reference generated form the JSDoc comments in the source of p5.js that are lacking the trailing slash and generate a 301 HTTP redirect. These need some manual labor in the source code. This is a bit tedious work. It should not be a problem keeping this as is since the site-host + Astro take care of the redirects.

Broken Links

There are some broken links also generated from the JSDoc that need fixing and I found broken links in the tutorials. They are hard-coded into the .mdx files and need to be fixed one by one.

Next Steps

IMO the next steps should be


Below you can find the report

report created: Fri, 13 Sep 2024 11:06:07 GMT

Would you like to work on the issue?

yes

Qianqianye commented 2 months ago

Thanks @ff6347, this looks great! We have multiple open issues about broken links that we are gradually fixing by batches, since the reasons for the broken links vary.

To provide some context:

Here’s my suggestion: Let's use this issue as the main issue to consolidate all the broken link issues. We can update the issue with the latest report when some links are fixed, for examples, some broken links in the report generated on 2024-09-08 are already fixed. For anyone interested in contributing, feel free to start by fixing broken links in the Contribute or Tutorial pages in batches, then we can move to Reference page and others. Please go ahead and open PRs as you work through them. Thank you!

ff6347 commented 2 months ago

hi @Qianqianye thanks for the clarification where to look and start. 👌🏾

I updated the issue description to have check boxes rather then plaintext to track the progress. I also created a script to regenerate the report. https://github.com/ff6347/find-broken-links

I'll ping you as soon as I have the first PR.

thegodworm commented 1 month ago

Hi! I'm new to contributing and I wanted to let you know that I'm interested in fixing the broken links. I will open PR gradually.

thegodworm commented 1 month ago

So for the contributor docs, I don't get it. reading the doc ; p5js-websites pulls from p5.js then uses astro to build the static website right? " https://p5js.org/contribute/contributor_guidelines.md#software-design-principles" doesn't work but

https://p5js.org/contribute/contributor_guidelines/#software-design-principles works, it seems it's all due to .md not being present, thus refering to the correct page. I don't know how to correct this ( p5.js or p5jswebsite repo?) or the conversion script is missconfigured?

ff6347 commented 1 month ago

@thegodworm Astro transforms all Markdown .md files to a folder with the name of that file (minus the extension) and an index.html in it.

thegodworm commented 1 month ago

after inspections, the links are broken because they are not built correctly (via script?)

<a href="./pathToContent.md#anchor"> point To X</a> -> error 404.

works when corrected like this:

<a href="../pathToContent#anchor">** point To X</a>

adding ../ relative path correctly and removing the md.

I would ask what would be the best course of actions to take?

writing a script automatically addressing the issues? manually?

or modifying how the script for astro handle site building?

davepagurek commented 1 month ago

@thegodworm the content is intentionally specifying a .md file so that they work when viewed on GitHub, so I think doing it as part of a script makes sense here. There's already some initial parsing and converting done here that can be modified: https://github.com/processing/p5.js-website/blob/a872651a35b08201d84392e3e65e5e8ed85b8d4c/src/scripts/builders/contribute.ts#L90-L95

thegodworm commented 1 month ago

@davepagurek yes , I saw that, I will see then what I can do :) . I prefer to ask first because I want to be sure as it's my first collaboration 😃

davepagurek commented 1 month ago

No problem! Happy to have you helping out 🙂

ffd8 commented 1 week ago

Just a quick chime in while checking my teaching docs to see if/what broke and found a pretty big one: https://p5js.org/learn/ » https://p5js.org/tutorials/ ?

One idea would be to implement legacy dirs that forward, or better yet a custom 404 page that parses the URL window.location.href and compares to a list of new homes for those links? It could include a form input (if a true 404 without redirect) that creates an issue with expected URL and new URL if they research the solution.

ff6347 commented 6 days ago

I was thinking the same thing. The site needs a custom 404 page.

I'm not 100 % sure where this is deployed. If would be on cloudflare or netlify there is the possibility to configure redirect.

But I think it's GitHub pages with Cloudflare in front of it.

As far as I know, there is not the possibility to do this on GitHub pages. So a 404 page with some common links and a search might be necessary.