Open joshua-bn opened 1 year ago
Yeah, that's something to consider. I would opt for https://github.com/spatie/robots-txt instead as it's better maintained. What exactly do you want to achieve with the information?
Personally, I am looking for sitemaps declared in robots.txt but I think there's also value in checking for rules for crawling.
Fair enough, that's definitely another use-case. I'll see how we can get both working
On Thu, Jan 12, 2023, 15:58 Joshua Dickerson @.***> wrote:
Personally, I am looking for sitemaps declared in robots.txt but I think there's also value in checking for rules for crawling.
— Reply to this email directly, view it on GitHub https://github.com/spekulatius/PHPScraper/issues/177#issuecomment-1380502870, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAK7M45YFZADMOUK6LOEHLWSALZZANCNFSM6AAAAAATW5RGTE . You are receiving this because you commented.Message ID: @.***>
Would be nice to have the ability to parse robots.txt like RSS feeds.
$web->robots
https://github.com/bopoda/robots-txt-parser is a library. Not sure if it is the one to use here but it seems to do the job