manuzhang / mkdocs-htmlproofer-plugin

A MkDocs plugin that validates URL in rendered html files
MIT License
41 stars 16 forks source link

Relative paths to application from user guide marked invalid #58

Closed johnthagen closed 1 year ago

johnthagen commented 1 year ago

Describe the bug

We have some convenience links from our user guide to our full application. Our user guide is served along side our application, so we know what these relative URLs are when we build the application.

In mkdocs-htmlproofer-plugin 0.10.3 these were skipped just fine, but I suspect since:

These throw an error:

WARNING  -  Unable to locate source file for: ../../../api
ERROR    -  Error building page 'usage/rest.md':
ERROR    -  invalid url - ../../../api [404] [usage/rest.md]

To Reproduce

Add the following Markdown to the project:

Browse the [`/api`](../../../api)

Environment (please complete the following information):

argcomplete                2.1.2
beautifulsoup4             4.12.2
certifi                    2022.12.7
charset-normalizer         3.1.0
click                      8.1.3
colorama                   0.4.6
colorlog                   6.7.0
distlib                    0.3.6
filelock                   3.11.0
ghp-import                 2.1.0
idna                       3.4
Jinja2                     3.1.2
lxml                       4.9.2
Markdown                   3.3.7
MarkupSafe                 2.1.2
mergedeep                  1.3.4
mkdocs                     1.4.2
mkdocs-htmlproofer-plugin  0.12.0
mkdocs-macros-plugin       0.7.0
mkdocs-material            9.1.6
mkdocs-material-extensions 1.1.1
mkdocs-video               1.5.0
nox                        2022.11.21
nox-poetry                 1.0.2
packaging                  23.0
pip                        22.3.1
platformdirs               3.2.0
Pygments                   2.15.0
pymdown-extensions         9.11
python-dateutil            2.8.2
PyYAML                     6.0
pyyaml_env_tag             0.1
regex                      2023.3.23
requests                   2.28.2
setuptools                 65.6.3
six                        1.16.0
soupsieve                  2.4
termcolor                  2.2.0
tomlkit                    0.11.7
urllib3                    1.26.15
virtualenv                 20.21.0
watchdog                   3.0.0
wheel                      0.38.4

Contents of mkdocs.yml:

...
plugins:
  - search
  - macros
  - offline:
      enabled: !ENV [MKDOCS_MATERIAL_OFFLINE, false]
  - mkdocs-video:
      is_video: True
      video_autoplay: True
      video_loop: True
      video_muted: True
      video_controls: False
      css_style:
        width: "100%"
  - htmlproofer:
      validate_external_urls: true
      raise_error: True
use_directory_urls: false

CC @kolibri91

johnthagen commented 1 year ago

Perhaps we need a configuration option to opt out of #49? Or maybe someway to whitelist specific errors? Not sure.

kolibri91 commented 1 year ago

Sounds for me like you used a undocumented feature: the ignorance of relative links without anchor.

What would you expect? How should the plugin verify a relative link outside of its own root?

Please describe what solution you imagine? From implemention point of view, it would be easy to ignore them but this would open again the doors for mistakes.

Finally, it would end up in a list of urls, the plugin isn't able to check. This list must then be checked manually and it will lower the worth of this plugin.

@johnthagen I would consider your issue rather as a feature-request as a bug.

kolibri91 commented 1 year ago

Looks like the discussion was already outdated on my mobile. You formulated somehow your preferred solution.

Perhaps we need a configuration option to opt out of #49? Or maybe someway to whitelist specific errors? Not sure.

johnthagen commented 1 year ago

@kolibri91 I think probably this is a feature request to allow the 0.10.3 behavior to be an option? E.g. in 0.10.3 a user could opt into checking external URLs and anchors within the project but still allow them to have relative links into their hosted application.

I'm not sure how common this is, perhaps very rare, but it is something someone could do prior to #49.

johnthagen commented 1 year ago

I'm open to ideas here, but for users who host their MkDocs project along side another application, I could see allowing something like this without having to ditch all of the other checking this plugin offers useful.

kolibri91 commented 1 year ago

Yes, I agree with you. I see the benefit as well. To be honest, I saw already a similar issue within my company.

I'm currently thinking about an implicit rule in the code. The plugin checks two kinds of internal links. The ones with suffix .html and the others. The ones with html must be generated by a markdown page. The others must just exists (e.g. images). But for both kind, only the contents root directory is checked. I think we need a third class of relative links: out-of-source; those must also just exists.

But this will only partially restore the behavior we had prior 0.10.3. It will still raise errors if the files does not exists. I hope this is accepted as it in my opinion the purpose of this plugin.

Regarding your suggestion for a white-list (political correct: allow/ignore/exclude-list):

@manuzhang Does there not exist have feature like this where for a specific error a list of ignored urls can be provided. We might should re-think how the uses of the plugin would like to specify their "allows/ignores/excludes". I think the existing feature is something like a "ignore/allow-error-list" whereas what you propose would more be an "exclude-from-check-list".

johnthagen commented 1 year ago

Maybe we could support a regex that could be applied as a filter for errors, so that in my case you could do something like:

plugins:
  - htmlproofer:
      validate_external_urls: true
      raise_error: True
      exclude_urls:
        - ../../../api
kolibri91 commented 1 year ago

I see two possible PRs which can co-exist:

IMO, we should gave both to the users. But the exclude-pattern should be the last option to users to get their things working and not the way to do it.

johnthagen commented 1 year ago

IMO, we should gave both to the users. But the exclude-pattern should be the last option to users to get their things working and not the way to do it.

👍 Agreed.

kolibri91 commented 1 year ago

@johnthagen Could you might check if #59 solves your issue? I added an example similar to your reported issue to the integration test.

Currently, it's still a draft. I would need some time to refactor the code as the if's are too nested now.

johnthagen commented 1 year ago

@kolibri91 Thanks for the PR.

I installed the PR with:

python -m pip install git+https://github.com/manuzhang/mkdocs-htmlproofer-plugin.git@refs/pull/59/merge

but still got the same error listed in the issue. 🤔

Environment

argcomplete                2.1.2
beautifulsoup4             4.12.2
certifi                    2022.12.7
charset-normalizer         3.1.0
click                      8.1.3
colorama                   0.4.6
colorlog                   6.7.0
distlib                    0.3.6
filelock                   3.11.0
ghp-import                 2.1.0
idna                       3.4
Jinja2                     3.1.2
lxml                       4.9.2
Markdown                   3.3.7
MarkupSafe                 2.1.2
mergedeep                  1.3.4
mkdocs                     1.4.2
mkdocs-htmlproofer-plugin  0.13.0.dev0
mkdocs-macros-plugin       0.7.0
mkdocs-material            9.1.6
mkdocs-material-extensions 1.1.1
mkdocs-video               1.5.0
nox                        2022.11.21
nox-poetry                 1.0.2
packaging                  23.0
pip                        23.0.1
platformdirs               3.2.0
Pygments                   2.15.0
pymdown-extensions         9.11
python-dateutil            2.8.2
PyYAML                     6.0
pyyaml_env_tag             0.1
regex                      2023.3.23
requests                   2.28.2
setuptools                 67.6.1
six                        1.16.0
soupsieve                  2.4
termcolor                  2.2.0
tomlkit                    0.11.7
urllib3                    1.26.15
virtualenv                 20.21.0
watchdog                   3.0.0
wheel                      0.40.0
kolibri91 commented 1 year ago

Ok, that's at least a starting point. Might, I misunderstood your hierarchy. Could you please give me a more detailed example?

From your explanation, I would expect something like this but it does not fit ../../../api. There is one level too less.

project-root/
├─ api/
├─ my-docs/
│  ├─ usage/
│  │  ├─ rest.md
│  ├─ mkdocs.yml

Further, if I check my recently created example in the integration test, it does not work on the webpage. So, the link ../../../README.md is resolved to http://127.0.0.1:8000/README.md and can not be displayed in the browser.

johnthagen commented 1 year ago

@kolibri91 In my case, the link really has nothing to do with the actual project structure or any files at all.

The link is relative to where the user guide is hosted on NGINX.

So if the build Mkdocs page I'm linking from is hosted at:

https://mysite.com/docs/section/intro.html

The relative link ../../../api resolves to:

https://mysite.com/api

So in my use case this has nothing to do with the local file structure at all, and it pointing to another URL hosted next to the static user guide by NGNIX. In my use case, I'd still like mkdocs-htmlproofer-plugin to validate all of the anchors and absolute external URLs, but allow this link to be excluded from the new relative path checking.

kolibri91 commented 1 year ago

Ok, this means the folder api does only exist on your server hosting the generated page?

johnthagen commented 1 year ago

Sort of, there is no folder per se, it's just a URL that could point to any relative URL. The URL will be served by NGINX and it could contain a React frontend, OpenAPI docs, etc.

There is no way that the MkDocs project could know about the validity of this path, which is why I think a simple way to exclude paths like this would likely be the best way to go, since now all relative paths are being checked.

kolibri91 commented 1 year ago

For this particular use-case: Yes

kolibri91 commented 1 year ago

I see two possible PRs which can co-exist:

  • Check out-of-source files (improvement) #59
  • Add exclude patterns to config (new feature)

IMO, we should gave both to the users. But the exclude-pattern should be the last option to users to get their things working and not the way to do it.

Edited

kolibri91 commented 1 year ago

@johnthagen I might have good news for you. There is no need for a new feature. Just an improvement of the error-handling is needed and you will get back the behavior.

If I got your issue correct now, #61 should do the trick for you.

johnthagen commented 1 year ago

First off, before I test out #61, I somehow forgot we do have a way to exclude errors, so I turned that on:

  - htmlproofer:
      validate_external_urls: !ENV [HTMLPROOFER_VALIDATE_EXTERNAL_URLS, false]
      raise_error: True
      raise_error_excludes:
        404: ["../../../api"]

This changes the error to a warning, but still fails with

strict: true

Output:

WARNING  -  Unable to locate source file for: ../../../api

Aborted with 1 warnings in strict mode!
johnthagen commented 1 year ago

@kolibri91 With #61 the exclude works as expected! 🚀