lycheeverse / lychee

⚡ Fast, async, stream-based link checker written in Rust. Finds broken URLs and mail addresses inside Markdown, HTML, reStructuredText, websites and more!
https://lychee.cli.rs
Apache License 2.0
2.22k stars 134 forks source link

Unable to check url inside <style> tag and css file #1485

Open awang-01 opened 3 months ago

awang-01 commented 3 months ago

There are two broken links on this page https://testing826.wpenginepowered.com/, if you inspect it, both image's url miss the "g" at the end of url

lychee -vv https://testing826.wpenginepowered.com/
✔ [200] https://testing826.wpenginepowered.com/assets/css/styles.css

🔍 1 Total (in 0s) ✅ 1 OK 🚫 0 Errors

it did not catch those two broken urls.

mre commented 3 months ago

Here's a minimal example that we can use as a test:

<html>
   <head>
      <style>
         div {
             background-image: url("./lychee.png");
         }
      </style>
   </head>
</html>

The problem here (and in your example) is the relative path, ./lychee.png. (The extension doesn't matter at this point as it doesn't detect that link in the first place.)

We use linkify for parsing that, which doesn't discover a link here. You can copy-paste it here to verify: https://robinst.github.io/linkify/

"file://lychee.png" would be detected and so would "http://example.com/lychee.png".

If we want to support these links in lychee, we'd have to do some smarter parsing. We could use Servo's CSS parser, but it would require some work.

If someone is willing to look into that, I'd be open to accept pull requests, but I personally don't want to focus on it.

mre commented 3 months ago

This is more of a feature request than a bug, so I changed the issue title and added the enhancement label.