vezaynk / Sitemap-Generator-Crawler

PHP script to recursively crawl websites and generate a sitemap. Zero dependencies.
https://www.bbss.dev
MIT License
243 stars 93 forks source link

Don't mark redirects as scanned before scanning them #69

Open pronobis opened 6 years ago

pronobis commented 6 years ago

Currently pages, including redirects are marked as scanned before they are actually scanned. If the page redirects from a link without trailing / to a link with / (e.g. www.pronobis.pro/publications/zheng2018aaai redirects to www.pronobis.pro/publications/zheng2018aaai/), then the page will never be scanned (the scanner considers both links to refer to the same page, and the one without / is already added as scanned).

This simple change fixes it for me, although, I'm not sure if there won't be any unintended consequences.