lycheeverse / lychee

⚡ Fast, async, stream-based link checker written in Rust. Finds broken URLs and mail addresses inside Markdown, HTML, reStructuredText, websites and more!
https://lychee.cli.rs
Apache License 2.0
2.2k stars 133 forks source link

Fail to parse Google Analytics tracking URL #230

Closed seisman closed 3 years ago

seisman commented 3 years ago

Our site (https://www.pygmt.org/dev/) uses Google Analytics tracking codes like:


<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]\|\|function(){
  (i[r].q=i[r].q\|\|[]).push(arguments)},i[r].l=1*new
  Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
  ga('create', 'UA-38125837-7', 'auto', {'storage': 'none'});
  ga('set', 'anonymizeIp', true);
  ga('send', 'pageview');
  </script>

Running lychee gives the following error:

lychee https://www.pygmt.org/dev

✗ https://www.google-analytics.com/analytics.js','ga (HTTP status client error (404 Not Found) for url (https://www.google-analytics.com/analytics.js','ga))
mre commented 3 years ago

That's an upstream problem in linkify. See https://github.com/robinst/linkify/issues/20

mre commented 3 years ago

PR over at https://github.com/robinst/linkify/pull/21

mre commented 3 years ago

Upstream PR is merged now. 🎉

mre commented 3 years ago

This should be fixed on lychee master now.

mre commented 3 years ago

Given your input, I get the following output with lychee:

❯❯❯ lychee test.md --verbose 
✔ https://www.google-analytics.com/analytics.js [200 OK]

📝 Summary
---------------------
🔍 Total............1
✅ Successful.......1
⏳ Timeouts.........0
🔀 Redirected.......0
👻 Excluded.........0
🚫 Errors...........0

Will cut a new version soon.

mre commented 3 years ago

A new version is out now. Closing this.