scivision / linkchecker-markdown

Python asyncio + aiohttp Markdown *.md URL link checker: 10,000 files/second
MIT License
32 stars 18 forks source link

Detecting only URLs with domains, using HEAD method #5

Closed int-ua closed 5 years ago

int-ua commented 5 years ago

Fixes #4 Fixes issue with "https://" being detected as URL

scivision commented 5 years ago

I forgot to put a note on why I'd not used head earlier. I remembered when I suddenly was finding bunches of "bad" links that were actually "good" due to anti-crawling server behavior.

This actually generates a lot of false positives.

scivision commented 5 years ago

I think this can be made a parameter, for those who know they won't experience the anti-crawling issue that shows up when using requests.head vs. requests.get

scivision commented 5 years ago

this was done in 404e941caab1d266f