spatie / robots-txt

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers
https://spatie.be/en/opensource/php
MIT License
219 stars 36 forks source link

Add some efficiencies to prevent unnecessary requests #42

Closed tsjason closed 5 months ago

tsjason commented 6 months ago

There are two changes in this PR that will help me use the package more efficiently.

1) Be able to provider Robots with a RobotsTxt object directly. I have access to the robots.txt file in memory and don't want to write it to disk first. With this change, I can create a RobotsTxt object from the string in memory and create a Robot with it.

2) Calling mayIndex() and mayFollowOn() on a Robot class causes two requests to the remote server. This change pulls the file_get_contents() request up one level and prevents RobotsMeta and RobotsHeader from each having to do their own requests.

freekmurze commented 5 months ago

Thank you! Very nice!