raviqqe / muffet

Fast website link checker in Go
MIT License
2.5k stars 97 forks source link

Option to limit urls to same domain #379

Open robd003 opened 5 months ago

robd003 commented 5 months ago

It would be really great to have a --same-domain option that would only follow links under the same top level domain as the URL provided to muffet

I'm assuming most people just want to check their own website and don't necessarily want veer left into the rabbit hole of https://twitter.com/{username} or any other external links.

raviqqe commented 5 months ago

Did you try --include option?

robd003 commented 5 months ago

I've tried using --include domain.com but this will also include references to 3rd party sites like LinkedIn / Twitter if they have a share reference to the domain

I also tried a more involved regex --include ''https?://([a-zA-Z0-9-]+\.)?domain\.com' but it would be great to have an easy option instead of having to figure out the regex each time

jippi commented 4 months ago

I was coming here to ask basically the same thing :)

include can totally do it for me, but would be nice to just have --only-same-domain or similar flag, that would pick the host from the baseurl you pass in to muffet, configure the regex for me and then runs with it, leaving --include to be mostly 3rd party domains (which I thought it was by default, until I realised the muffet by default scans external URLs by default)