spatie / robots-txt

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers
https://spatie.be/en/opensource/php
MIT License
219 stars 36 forks source link

file_get_contents($source) throws an InvalidArgumentException on Websites with expired Certificates #34

Closed osthafen closed 2 years ago

osthafen commented 2 years ago

Websites with expired Certificates will stop the robots-txt-checker:

 InvalidArgumentException 

  Could not read from source `https://www.finde-ins-team.de/`

  at vendor/spatie/robots-txt/src/RobotsMeta.php:17
     13▕     {
     14▕         $content = @file_get_contents($source);
     15▕ 
     16▕         if ($content === false) {
  ➜  17▕             throw new InvalidArgumentException("Could not read from source `{$source}`");
     18▕         }
     19▕ 
     20▕         return new self($content);
     21▕     }

PHP Warning: file_get_contents(): SSL operation failed with code 1. OpenSSL Error messages: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed

maybe switch off verify_peer with

$context=array(
    "ssl"=>array(
        "verify_peer"=>false,
        "verify_peer_name"=>false,
    ),
);  

$response = file_get_contents("...", false, stream_context_create($context));

???

Thanky you!

spatie-bot commented 2 years ago

Dear contributor,

because this issue seems to be inactive for quite some time now, I've automatically closed it. If you feel this issue deserves some attention from my human colleagues feel free to reopen it.