spekulatius / PHPScraper

A universal web-util for PHP.
https://phpscraper.de
GNU General Public License v3.0
527 stars 74 forks source link

[Request] Add robots.txt parsing #177

Open joshua-bn opened 1 year ago

joshua-bn commented 1 year ago

Would be nice to have the ability to parse robots.txt like RSS feeds. $web->robots

https://github.com/bopoda/robots-txt-parser is a library. Not sure if it is the one to use here but it seems to do the job

spekulatius commented 1 year ago

Yeah, that's something to consider. I would opt for https://github.com/spatie/robots-txt instead as it's better maintained. What exactly do you want to achieve with the information?

joshua-bn commented 1 year ago

Personally, I am looking for sitemaps declared in robots.txt but I think there's also value in checking for rules for crawling.

spekulatius commented 1 year ago

Fair enough, that's definitely another use-case. I'll see how we can get both working

On Thu, Jan 12, 2023, 15:58 Joshua Dickerson @.***> wrote:

Personally, I am looking for sitemaps declared in robots.txt but I think there's also value in checking for rules for crawling.

— Reply to this email directly, view it on GitHub https://github.com/spekulatius/PHPScraper/issues/177#issuecomment-1380502870, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAK7M45YFZADMOUK6LOEHLWSALZZANCNFSM6AAAAAATW5RGTE . You are receiving this because you commented.Message ID: @.***>