raviqqe / muffet

Fast website link checker in Go
MIT License
2.51k stars 97 forks source link

Add support for Text Fragments #293

Open philrz opened 1 year ago

philrz commented 1 year ago

A colleague of mine recently created a page with a link that used a Text Fragment. It looks like this concept has been around long enough that common browsers support it, but Muffet unfortunately chokes on it. An example:

$ muffet --version
2.7.0

$ muffet --verbose --one-page-only --buffer-size 8192 https://www.brimdata.io/blog/wrangling-json-arrays-with-zed/
https://www.brimdata.io/blog/wrangling-json-arrays-with-zed/
    200 https://buttons.github.io/buttons.js
...
    999 https://linkedin.com/company/brimdata
    id #:~:text=The nested subquery,outer query's table. not found  https://zed.brimdata.io/docs/language/operators/over#:~:text=The%20nested%20subquery,outer%20query%27s%20table.
Sieboldianus commented 1 year ago

For jekyll-reveal sites, this is rather counterproductive, as it is used as a native URL anchor, e.g.:

https://kartographie.geo.tu-dresden.de/ad/python_datascience_2022/#/35/0/0

I see lots of errors from muffet for these Text Fragments that do point to valid URLs.

raviqqe commented 1 year ago

For now, I made changes to ignore text fragments to avoid unexpected errors in #301.

BTW, at least on Chrome, text fragments can capture texts across element boundaries. That seems to make implementation of text fragment matching easier. 🤣

e.g.

https://zed.brimdata.io/docs/language/operators/over#:~:text=lateral,Synopsis
raviqqe commented 5 months ago

For my notes, browsers supporting text fragments: https://caniuse.com/?search=text%20fragments%20