Closed OnkarRuikar closed 3 months ago
Note that if you group them by broken fragments (rather than by page with a broken link), we can look for one and solve all occurrences, and the list can be resolved more quickly.
Note that if you group them by broken fragments (rather than by page with a broken link), we can look for one and solve all occurrences, and the list can be resolved more quickly.
But then we'll get multiple commits on each file :)\ If that is not an issue then I've added sorted list of unique urls to OP.
Note that if you group them by broken fragments (rather than by page with a broken link), we can look for one and solve all occurrences, and the list can be resolved more quickly.
As long as we don't do one commit per flaw… I think this is ok: there is about 20 times the same broken link to HTTP/Cache#varying… If all are fixed in the same PR, it is quick to review, and I don't mind that the other flaws in these files are solved separately.
@OnkarRuikar: I don't think this issue can be kept up to date. Can you share the script used to generate the gists and every time we try to fix something we can run the script?
every time we try to fix something we can run the script?
I don't think this issue can be kept up to date.
We don't have to. The current script prevents new occurrences from happening. The gist still hold true, e.g. content/files/en-us/learn/tools_and_testing/client-side_javascript_frameworks/ember_interactivity_events_state/index.md Learn/JavaScript/Building_blocks/Events#addeventlistener_and_removeeventlistener
is still broken.
What about anchors that happen to be fixed, though? For example, the very first item, "content/files/en-us/glossary/script-supporting_element/index.md Web/HTML/Kinds_of_HTML_content#script-supporting_elements", was fixed in #28000.
Anyway I'll drop my script used in https://github.com/orgs/mdn/discussions/279:
import fs from "node:fs/promises";
import path from "node:path";
import { load } from "cheerio";
async function* getFiles(dir) {
const dirents = await fs.readdir(dir, { withFileTypes: true });
for (const dirent of dirents) {
const res = path.join(dir, dirent.name);
if (dirent.isDirectory()) {
yield* getFiles(res);
} else if (res.endsWith("plain.html")) {
yield Promise.all([res, fs.readFile(res, "utf8")]);
}
}
}
const files = getFiles("build");
/** @type {Record<string, string[]>} */
const pathToIds = { __proto__: null };
/** @type {Record<string, string[]>} */
const pathToLinks = { __proto__: null };
for await (const [filename, content] of files) {
const $ = load(content);
pathToLinks[filename] = $("a")
.map((i, el) => $(el).attr("href"))
.get()
.map(
(url) =>
new URL(
url.startsWith("#")
? `${filename.replace(/build\/|\/plain.html/g, "")}${url}`
: url,
"https://developer.mozilla.org"
)
)
.filter(
(url) =>
url.hostname === "developer.mozilla.org" &&
url.hash &&
url.hash !== "#languages-switcher-button"
);
pathToIds[filename] = $("[id]")
.map((i, el) => $(el).attr("id"))
.get();
}
for (const [filename, urls] of Object.entries(pathToLinks)) {
const failures = [];
for (const url of urls) {
const referencedPath = path.join(
"build",
url.pathname
.replaceAll("*", "_star_")
.replaceAll("::", "_doublecolon_")
.replaceAll(":", "_colon_")
.replaceAll("?", "_question_")
.toLowerCase(),
"plain.html"
);
const ids = pathToIds[referencedPath];
if (ids) {
const target = ids.find((id) => id === decodeURI(url.hash.slice(1)));
if (!target) {
failures.push(
`Anchor not found: ${url
.toString()
.replace("https://developer.mozilla.org", "")}`
);
}
} else if (
!["/", "/en-US/docs/MDN/Contribute/Feedback", "/docs/MDN/About"].includes(
url.pathname
)
) {
failures.push(
`Page not found: ${url
.toString()
.replace("https://developer.mozilla.org", "")}`
);
}
}
if (failures.length) {
const sourceFile = path.resolve(
filename
.replace("build/en-us/docs", "files/en-us")
.replace("plain.html", "index.md")
);
console.log(`In ${sourceFile}:
${failures.map((e) => ` - ${e}`).join("\n")}
`);
}
}
Anyway I'll drop my script used in github.com/orgs/mdn/discussions/279:
The script doesn't seem to handle definition list anchors:
In /home/onkar/GitHub/mdn/yari/client/files/en-us/learn/common_questions/design_and_accessibility/what_is_accessibility/plain.md:
- Page not found: /client/en-us/docs/learn/common_questions/design_and_accessibility/what_is_accessibility/plain.html#hearing_impairment
- Page not found: /client/en-us/docs/learn/common_questions/design_and_accessibility/what_is_accessibility/plain.html#visual_impairment
- Page not found: /client/en-us/docs/learn/common_questions/design_and_accessibility/what_is_accessibility/plain.html#pausing_capacity
- Page not found: /client/en-us/docs/learn/common_questions/design_and_accessibility/what_is_accessibility/plain.html#keyboard_capacity
I've updated the broken fragment data in OP. As per my script there seems only 617 broken URL fragments left. The script runs on custom platform, which i developed years ago. It'll take some effort to make is stand alone.
@bsmth We've already plugged the hole so there aren't going to be new broken frags. For existing ~617 ones we need to create a project and a online spreadsheet to track them while fixing them.
The script doesn't seem to handle definition list anchors:
I don't see those problems in my output. Make sure you have built using the latest content.
Also, the pre-commit hook isn't able to catch anchor errors generated from macros.
The script doesn't seem to handle definition list anchors:
I don't see those problems in my output. Make sure you have built using the latest content.
I built it in yari. Doing it in content works.
Also, the pre-commit hook isn't able to catch anchor errors generated from macros.
Yes, for macros the entire content needs to be built. Witch won't be favourable before every commit or in GitHub workflows.
Closing in favor of https://github.com/mdn/mdn/issues/561
Closing in favor of mdn/mdn#561
good idea, thanks!
There are many broken URL fragments in
mdn/content
.\ Almost all the time when a header(## header
) gets modified, contributors don't update related fragments all over the repo.Following is the list of broken fragments data: https://gist.github.com/OnkarRuikar/fa14827682a37f04f23c49189bfa0701