remarkjs / remark-validate-links

plugin to check that markdown links and images reference existing files and headings
https://remark.js.org
MIT License
110 stars 26 forks source link

Does not catch bad links that use absolute paths to repo root #75

Closed eatonphil closed 1 week ago

eatonphil commented 1 year ago

Initial checklist

Affected packages and versions

12.1.0

Link to runnable example

No response

Steps to reproduce

I have a repo: https://github.com/eatonphil/docs-link-check-test. If you run ./remark-check.sh it will catch one bad link but not another bad link.

$ cat README.md
[link](/src/test.md)

This should fail
[link](/src/test.md#doesnt-exist)

But only this one actually fails
[link](./src/test.md#doesnt-exist)

Expected behavior

GitHub allows you to use absolute paths starting with / to mean the root directory of the repo. remark-validate-links should handle looking at the repo root to resolve these files.

Actual behavior

Root-absolute paths are not validated.

Runtime

No response

Package manager

No response

OS

No response

Build and bundle tools

No response

eatonphil commented 1 year ago

This looks like https://github.com/remarkjs/remark-validate-links/issues/57 but I think #57 is talking about the ability to rewrite certain root paths.

I'm not talking about that. I just want to be able to resolve files relative to the root of the same repo. No rewrites.

wooorm commented 1 year ago

Hey!

I don’t think this was supported before. But your repo shows that it does work. I was under the impressions that they used to be related to github.com, like normal.

See this handling: https://github.com/remarkjs/remark-validate-links/blob/415557c15bbf6889e598b50fc885e6c22c9b654f/lib/find/find-references.js#L220.

A PR to solve it would likely around that line.

And it might need some more testing, to see if /wooorm/markdown-rs/... still points to some other repo on GH, or to a file in the current repo. And perhaps some testing on Gitlab / Bitbucket?

Interested in working on this?

eatonphil commented 1 year ago

I've got it working with this diff, but you're right there are some edge cases. I don't really care to test on gitlab/bitbucket since I'm not as familiar with them. So how do you feel about this?

git diff
diff --git a/lib/find/find-references.js b/lib/find/find-references.js
index 953dec3..f930f4d 100644
--- a/lib/find/find-references.js
+++ b/lib/find/find-references.js
@@ -217,14 +217,21 @@ export async function findReferences(ctx) {
  */
 // eslint-disable-next-line complexity
 function urlToPath(value, config, type) {
-  // Absolute paths: `/wooorm/test/blob/main/directory/example.md`.
+  // Absolute paths: `/wooorm/test/blob/main/directory/example.md` or `/directory/example.md`.
   if (value.charAt(0) === slash) {
     if (!config.urlConfig.hostname) {
       return
     }

-    // Create a URL.
-    value = https + slashes + config.urlConfig.hostname + value
+    // A root absolute path without the repo, "blob", and branch name: `/directory/example.md`.
+    // TODO: Figure out how this applies, if at all, to GitLab and Bitbucket.
+    if (!config.urlConfig.prefix.startsWith(value) && config.urlConfig.hostname === 'github.com') {
+      const valueRelativeToRoot = value.slice(1)
+      value = path.resolve(config.root, valueRelativeToRoot)
+    } else {
+      // Create a URL.
+      value = https + slashes + config.urlConfig.hostname + value
+    }
   }

   /** @type {URL|undefined} */

If that's ok I'll open a PR.

wooorm commented 1 year ago

I recommend adding an option to urlConfig, similar to lines: boolean, which is then checked for here. Something along the lines of: resolveAbsolutePathsInRepo: boolean? That way, folks can configure whether their hosted git supports this or not.

Furthermore, you are now creating a path inside value, in your if branch. But the value we are making is a (serialized) URL. path.resolve and config.root are all about paths. (note: see my last paragraph of this comment later touches on this again)

For reference, I checked the following in a private repo (wooorm/private) on GitHub:

readme.md

[alpha](/remarkjs/remark-validate-links/readme.md)
[bravo](/remarkjs/remark-validate-links/blob/readme.md)
[charlie](/remarkjs/remark-validate-links/blob/main/readme.md)
[delta](/remarkjs/remark-validate-links)
[echo](/remarkjs)
[foxtrot](/x/y/z.md)

x/y/z.md (empty)

Github generates the following HTML for the readme:

<a href="/wooorm/private/blob/main/remarkjs/remark-validate-links/readme.md">alpha</a>
<a href="/wooorm/private/blob/main/remarkjs/remark-validate-links/blob/readme.md">bravo</a>
<a href="/wooorm/private/blob/main/remarkjs/remark-validate-links/blob/main/readme.md">charlie</a>
<a href="/wooorm/private/blob/main/remarkjs/remark-validate-links">delta</a>
<a href="/wooorm/private/blob/main/remarkjs">echo</a>
<a href="/wooorm/private/blob/main/x/y/z.md">foxtrot</a>

That is to say, all absolute paths are prefixed with /wooorm/private/blob/main. so !config.urlConfig.prefix.startsWith(value) and the current handling are not needed. The information needed to construct /wooorm/private/blob/main, other than the branch, is in urlConfig.prefix. Missing a branch is fine, we don’t use it yet: https://github.com/remarkjs/remark-validate-links/blob/415557c15bbf6889e598b50fc885e6c22c9b654f/lib/find/find-references.js#L257.

So, I personally would do something like the following pseudocode:

  // Absolute paths: `/path/to/file.md`.
  if (value.charAt(0) === slash) {
    if (!config.urlConfig.hostname) {
      return
    }

    // Create a URL.
    const pathname = config.urlConfig.resolveAbsolutePathsInRepo && config.urlConfig.prefix
      ? config.urlConfig.prefix + 'unknown' + value
      : value.slice(1)
    value = https + slashes + config.urlConfig.hostname + pathname
  }

…where unknown is a temporary value for a branch name, which is dropped later!

Finally, the function as I look at it now is a bit of a mix between path and URL handling. That’s ambiguous and probably points to some bugs. It’s probably better to investigate all that and look at it some more some other time though!

iainelder commented 1 year ago

Just discovered that this is missing! I thought it was working until I renamed the file that was referenced by relative-to-repo-root syntax. Looking forward to a fix.