Closed karlhorky closed 2 months ago
Hey @karlhorky! 👋
remark-cli
by default only matches markdown extensions.
https://github.com/remarkjs/remark/blob/97e6f149976e486b5008df073d27f0507d0e58a4/packages/remark-cli/cli.js#L32
the list of default extensions is https://github.com/sindresorhus/markdown-extensions/blob/797fc3dda4ecc9aab668bfdd745f570593f1581d/index.js#L4-L11
If you want to include MDX add the --ext mdx
option to your matcher commands.
Hope this helps!
This doesn't happen on ubuntu-latest
I suspect this is globstar related. Try setting
shopt -s globstar
On your mac shell.
This didn't happen in versions before remark-lint-no-dead-urls@2.0.0
This shouldn't be possible.
This plugin isn't path aware.
I strongly suspect you updated remark-cli
at the same time.
In addition to address potential follow ups:
markdown-extensions
?" I'd recommend against that, MDX is not markdown.Hi! This was closed. Team: If this was fixed, please add phase/solved
. Otherwise, please add one of the no/*
labels.
This didn't happen in versions before remark-lint-no-dead-urls@2.0.0
This shouldn't be possible. This plugin isn't path aware. I strongly suspect you updated
remark-cli
at the same time.
Nope, the PR only updated from remark-lint-no-dead-urls@1.1.0
to remark-lint-no-dead-urls@2.0.0
Been working for literally years before the bump to remark-lint-no-dead-urls@2.0.0
Can create a PR to my reproduction repo if you need hard proof.
If you want to include MDX add the
--ext mdx
option to your matcher commands.
Looks like --ext mdx
is what is missing, thanks!
https://github.com/karlhorky/repro-remark-lint-no-dead-urls-folder-patterns/pull/1
So it looks like, on macOS (without any further configuration or installing anything):
.
, ./src/pages/
and "src/pages/**/*.mdx"
resolve all files correctly (incl. nested files) ✅src/pages/**/*.mdx
(without quotes) does not resolve files correctly (misses nested files) ❌So on macOS, the quotes are important for remark-cli
Nope, the PR only updated from remark-lint-no-dead-urls@1.1.0 to remark-lint-no-dead-urls@2.0.0
@karlhorky your screenshot shows 8 files being updated, it's doing a lot more than just updating a dependency.
So on macOS, the quotes are important for remark-cli
When the glob isn't quoted it is handled by your shell. You are using a bash shell without globstar enabled, if you want to use unquoted globs. Enabled globstar
shopt -s globstar
your screenshot shows 8 files being updated, it's doing a lot more than just updating a dependency.
Those are other files that I changed myself, also not related to updating any dependencies. It was me trying different options to fix the problem of files being missed. But I tested it first without those other unrelated changes.
I still strongly suspect that remark-cli
/unified-engine
/glob
were updated through that PR.
Given that the search you did shows remark-cli
is in the diff somewhere off screen from what you did a screenshot of, it is still the most likely case.
It was in the previous package.json
screenshot, that was the 1 occurrence (the line was unchanged)
But maybe something else caused a different remark-lint
version to be used, I can't be 100% certain of that.
Not in the diff of the PR though.
I also tried out options.extensions
to see if I can avoid having --ext mdx
in the CLI command, but it looks like that's not possible in a Remark config (maybe I have the wrong types for the Remark config):
remark-lint-no-dead-urls.mjs
/** @type {Omit<import('unified-engine').Options, 'processor'>} */
const config = {
extensions: ['mdx'],
plugins: [
'remark-mdx',
[
Looks like --ext mdx is what is missing, thanks!
I'm glad it works. I'm a bit surprised it ever worked without that option.
I also tried out options.extensions to see if I can avoid having --ext mdx in the CLI command, but it looks like that's not possible in a Remark config (maybe I have the wrong types):
options.extensions
is part of creating a CLI https://github.com/remarkjs/remark/blob/97e6f149976e486b5008df073d27f0507d0e58a4/packages/remark-cli/cli.js#L30-L43
It is not part of a configuration file.
The type for a config is Preset
https://github.com/unifiedjs/unified-engine/blob/main/readme.md#preset
Looks like --ext mdx is what is missing, thanks!
I'm glad it works. I'm a bit surprised it ever worked without that option.
Yeah, maybe what the old version of remark-cli
did was read the pattern for .<extension>
and add that to the extensions
option...
The type for a config is
Preset
unifiedjs/unified-engine@main
/readme.md#preset
Thanks! I'll take a look at using this type instead. Does it also have the frail
boolean property? It's not documented at that section... But maybe it's somewhere else, eg. under import('unified').Settings
Edit: hm... import('unified').Settings
is a very permissive type:
/**
* Configuration passed to a Plugin or Processor
*/
interface Settings {
[key: string]: unknown
}
Would be great to have some type checking for typos like fraiil: true
🤔 Also, seems that it's not happy with top-level frail
property
/** @type {import('unified-engine').Preset} */
const config = {
settings: {
frail: true,
fraiil: true, // 💥 No error on typo
},
frail: true, // 💥 Object literal may only specify known properties, and 'frail' does not exist in type 'PresetSupportingSpecifiers'.
Edit 2: I think that's why I went with the original type of Omit<import('unified-engine').Options, 'processor'>
- because it has the frail
property and other things too:
So I think I'll stick with Omit<import('unified-engine').Options, 'processor'>
for now, after I confirm that frail: true
at top level is working...
Edit 3:
It seems that frail: true
is not a thing for a Remark config file, doesn't trigger the same behavior as the --frail
config flag. So I'll switch back to the import('unified-engine').Preset
type.
hm... import('unified').Settings is a very permissive type:
Settings are per-pipeline, and depend a lot on the parser and compiler used. There also can be multiple distinct pipelines in a project, so attaching the type to a globally shared setting interface (similar to how mdast and hast can be extended with new types), would be less viable than the node type scenario. I suspect trying to infer these will run into similar issues as https://github.com/unifiedjs/unified/issues/221
Weird, so in my reproduction repo (ubuntu-latest
), it's finding the files in all cases except src/pages/**/*.mdx
without quotes
But then in our private monorepo on GitHub Actions (many more links), it succeeds silently without any output, although I know that there are broken links contained in some of the files:
$ remark "./src/pages/**/*.mdx" --ext mdx --frail --rc-path remark-lint-no-dead-urls.mjs
Done in 57.25s.
The broken links show up on my local macOS development environment though 🤔
$ remark "./src/pages/**/*.mdx" --ext mdx --frail --rc-path remark-lint-no-dead-urls.mjs
src/pages/x/n/d/thank-u-next/index.mdx
486:3-486:116 warning Link to https://chrome.google.com/webstore/detail/wildfire/djhgeeodemlfdpmcccdekfalbhllcoim/ is dead no-dead-urls remark-lint
⚠ 24 warnings
error Command failed with exit code 1.
I've tried patterns with ./src/pages/
, ./src/pages
, all with the --ext mdx
... seems like there's few differences left that I can think of (except argument order in the command, which I'm assuming doesn't matter).
I also thought it could be a path length issue, but that didn't cause the repro to stop working:
Would be cool to be able to show the full list of URLs tested, even if they are passing - I tried remark --verbose
for this, but it didn't change anything about the output.
Maybe that's a feature request for remark-lint-no-dead-urls
though...
it's finding the files in all cases except src/pages/*/.mdx without quotes
Even on ubuntu, this is controlled by your shell. The distro being used may not have globstar enabled. run:
shopt -s globstar
But then in our private monorepo on GitHub Actions (many more links), it succeeds silently without any output, although I know that there are broken links contained in some of the files:
Check how you are running the script, and what the current wording directory cwd
is.
It sounds like you have the script pointed to the wrong folder.
Would be cool to be able to show the full list of URLs tested
I'm pretty sure the issue is folder related, rather than URL. Adding this feature wouldn't really help you debug the issue you are having.
Interesting:
src/a/b/c/d/index.mdx
), it works in the monorepo, showing the error message (although it also puts out the file content to stdout
, maybe this is a standard behavior of remark-cli
with a single file?)Run time: 5m 20s
$ remark src/pages/tech-fundamentals-foundations-immersive/modules/thank-u-next/index.mdx --ext mdx --frail --rc-path remark-lint-no-dead-urls.mjs
***
title: "Thank U, Next"
sortOrder: 11
curriculumModuleCategory: '{"id":2,"title":"Lectures","symbol":"🧑🏫"}'
------------------------------------------------------------------------
# Thank U, Next
... more file content here ...
...
486:3-486:116 warning Unexpected dead URL `[https://chrome.google.com/webstore/detail/wildfire/djhgeeodemlfdpmcccdekfalbhllcoim/`](https://chrome.google.com/webstore/detail/wildfire/djhgeeodemlfdpmcccdekfalbhllcoim/%60), expected live URL no-dead-urls remark-lint
[cause]:
error Unexpected not ok response `404` (`Not Found`) on `[https://chromewebstore.google.com/detail/wildfire/djhgeeodemlfdpmcccdekfalbhllcoim/`](https://chromewebstore.google.com/detail/wildfire/djhgeeodemlfdpmcccdekfalbhllcoim/%60) dead dead-or-alive
⚠ 13 warnings
error Command failed with exit code 1.
index.mdx
(eg. src/a/b/c/d/
), although I think it's still linting the files, because it still takes a long time, just doesn't show the output and doesn't exit with code 1Run time: 10m 27s
$ remark src/pages/tech-fundamentals-foundations-immersive/modules/thank-u-next/ --ext mdx --frail --rc-path remark-lint-no-dead-urls.mjs
Done in 627.86s.
Not sure why it would take twice as long though, since there is only the index.mdx
file in that directory...
(although it also puts out the file content to stdout, maybe this is a standard behavior of remark-cli with a single file?)
It is, if you want to turn that off for single files use --no-stdout
https://github.com/unifiedjs/unified-args?tab=readme-ov-file#--stdout
Not sure why it would take twice as long though, since there is only the index.mdx file in that directory...
Usually when it takes significantly longer when using globs/patterns.
There is a hidden/ignored folder like node_modules
, .git
, or another tool that has a large number of files that would take time to scan.
Taking a long time does not mean it is looking at the folder you are expecting.
Update: I have it working with the folder path now (the one like src/a/b/c/d/
in 2 above)
I did this by removing a large amount of the content in the (long) MDX file.
So it appears that it's related to MDX content in the file - maybe something invalid or some link that is causing remark-cli
/ remark-lint
/ remark-lint-no-dead-urls
to fail silently when there's more than 1 file specified (but does not fail when only 1 is specified)
So it appears that it's related to MDX content in the file
What indicates it is the MDX content specifically, over the length of the file?
Another worth while question, how many URLs are in that document? And do any of them have rate limiting or DDoS protection applied?
The functional attempts to access all urls in the document simultaneously. https://github.com/remarkjs/remark-lint-no-dead-urls/blob/c1365e1becd238b82c88804888e8f0ebdcd72abe/lib/index.js#L148-L149 With a sufficiently large document you could be triggering your own rate limiting/DDoS protections.
when there's more than 1 file specified
Again, this plugin can literally only see a single file at a time https://github.com/remarkjs/remark-lint-no-dead-urls/blob/c1365e1becd238b82c88804888e8f0ebdcd72abe/lib/index.js#L57
What indicates it is the MDX content specifically, over the length of the file?
I guess it could also be file length (485 lines), but I was not expecting that. To reiterate, the entire file content is not a problem when specifying a path directly to the file eg. remark src/a/b/c/d/index.mdx --ext mdx ...
It appears that the problematic bits are code blocks in the file.
Another worth while question, how many URLs are in that document? And do any of them have rate limiting or DDoS protection applied?
There are 60 URLs in the file, and blocking is not a problem when I use a file path directly to the file.
Again, this plugin can literally only see a single file at a time
Oh maybe I was misunderstanding earlier - are you saying that the directory path pattern I'm using above remark ./src/a/b/c/d/ --ext mdx ...
won't work?
Again, this plugin can literally only see a single file at a time
Oh maybe I was misunderstanding earlier - are you saying that the directory path pattern I'm using above remark ./src/a/b/c/d/ --ext mdx ...
won't work?
It appears that the problematic bits are code blocks in the file.
Do the code blocks contain a url
attribute?
Could you share an anonymized/non-proprietary version of the file?
Oh maybe I was misunderstanding earlier - are you saying that the directory path pattern I'm using above remark ./src/a/b/c/d/ --ext mdx ... won't work?
I'm saying that unified-engine
handles passing files one at a time to the plugins.
The number of files shouldn't have an impact on how this plugin works.
To reiterate, the entire file content is not a problem when specifying a path directly to the file
@wooorm could that change the file meta data? There is one block that depends on the file meta data in this project. https://github.com/remarkjs/remark-lint-no-dead-urls/blob/c1365e1becd238b82c88804888e8f0ebdcd72abe/lib/index.js#L84
@karlhorky are you testing any relative links which would depend on this?
Interesting, it seems the behavior is "flaky" - when running the command below, sometimes remark-lint-no-dead-urls
exits with no output and a 0
exit code, and then other times, it shows the broken links output.
remark src/pages/tech-fundamentals-foundations-immersive/modules/thank-u-next/index.mdx --ext mdx --frail --rc-path remark-lint-no-dead-urls.mjs
The one Chrome extension link https://chrome.google.com/webstore/detail/wildfire/djhgeeodemlfdpmcccdekfalbhllcoim/
is definitely broken always though, so not sure why it would ever report no output with a 0
exit code.
Here is an example of some of the HTML that we have in the code blocks (unusual HTML 3.2 code, teaching about legacy HTML):
16. Right click on the new `legacy` folder and create a new file called `index.html` with the following content:
```html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>TodoMVC</title>
</head>
<body>
<script>
function rerender() {
frames[1].location.reload();
}
function click(id) {
for (var i = 0; i < state.numTodos; i++) {
if (state.todos[i].id == id) {
state.todos[i].complete = !state.todos[i].complete;
rerender();
return;
}
}
}
function addTodo(text) {
var nextId = state.nextId;
state.nextId++;
state.todos[state.numTodos] = makeTodo(nextId, text, false);
state.numTodos++;
rerender();
}
function clearCompleted() {
var newTodos = new Object();
var newNum = 0;
for (var i = 0; i < state.numTodos; i++) {
if (!state.todos[i].complete) {
newTodos[newNum] = state.todos[i];
newNum++;
}
}
state.todos = newTodos;
state.numTodos = newNum;
rerender();
}
function filter(newFilter) {
state.filter = newFilter;
rerender();
}
function makeTodo(id, label, complete) {
var todo = new Object();
todo.id = id;
todo.label = label;
todo.complete = complete;
return todo;
}
var state = new Object();
state.filter = 'all';
state.todos = new Object();
state.numTodos = 0;
state.nextId = 1;
</script>
<FRAMESET rows="75,*" border="0" frameborder="no" framespacing="0">
<FRAME src="header.html" scrolling="no">
<FRAME name=content src="todos.html" scrolling="no">
</FRAMESET></body>
</html>
Parts of the code are also highlighted red in the HTML highlighting in VS Code, because of the invalid / legacy code:
![Screenshot 2024-09-29 at 00 42 08](https://github.com/user-attachments/assets/432f61c2-614a-45e8-8a3d-b74e053ba08c)
Maybe I should try running it again 5 or 10 times with the direct path to the file, with unchanged file content - maybe `remark-lint-no-dead-urls@2.0.0` just has some generally flaky behavior, and it has nothing to do with the path containing the `/index.mdx` on the end or just a `/` ... 🤔
Edit: Since I saw that there were `undici` paths in the warning / error messages from `remark-lint-no-dead-urls`, I checked into that, and I found that I had a slightly outdated version in my tree (`undici@6.19.7` instead of `undici@6.19.8`). I re-generated my lockfile and am retesting now... first test run is showing failures now. So if this leads somewhere, it opens a new possibility - maybe it is a bug that's already fixed in a transitive dependency...
Edit 2: No, upgrading to `undici@6.19.8` didn't seem to resolve the problem. `remark-lint-no-dead-urls@2.0.0` is still intermittently, silently failing with no stdout (also across different test runs when making no content changes or path changes). I think I'll hold off on upgrading for now, not sure there's much else I can do to debug without diving into the code of `remark-lint-no-dead-urls@2.0.0` vs `remark-lint-no-dead-urls@1.1.0`
Ok, that was actually easier than I thought to reproduce - just run the lint command multiple times in the GitHub Actions workflow and it immediately shows the intermittent false negatives / flaky behavior:
Initial checklist
Affected packages and versions
remark-cli@12.0.1, remark-lint@10.0.0, remark-lint-no-dead-urls@2.0.0
Link to runnable example
https://github.com/karlhorky/repro-remark-lint-no-dead-urls-folder-patterns
Steps to reproduce
macos-latest
(GitHub Actions workflow file) and observe the missed files in patternsremark . ...
,remark ./src/pages/ ...
,remark src/pages/**/*.mdx ...
etc on macOS (only paths to singular files seem to be working on macOS)This doesn't happen on
ubuntu-latest
This didn't happen in versions before
remark-lint-no-dead-urls@2.0.0
Runtime: Node.js latest LTS (20.17.0)
Expected behavior
Patterns are resolved to multiple files
Actual behavior
Patterns are resolved to a single file or zero files
Alternatives Considered
Maybe this is a general problem in
remark-cli
upstream? (eg. maybe this issue should be transferred toremarkjs/remark
issues instead?)Runtime
Other (please specify in steps to reproduce)
Package manager
pnpm
OS
macOS
Build and bundle tools
No response