Closed mattiasgeniar closed 4 years ago
My first attempt would be to loop over the string with strpos
:
while ($newline = strpos($html, PHP_EOL) !== false) {
$lines[] = substr($html, 0, $newline);
$html = substr($html, $newline);
}
But I think that could be significantly slower.
What about using stream resources? Write it to a temp file and then use stream functions (i.e., fgets).
Your bottleneck is memory, here. Working with long strings in memory will eventually push memory to its limit. Also, adjusting PHP's memory will not allow you to handle any string. But using stream resources will.
Will increase memory size.
Would it be possible to rewrite the class using resources instead of reading the file as a string?
I would use a stream in a generator:
protected function findRobotsMetaTagLine(string $html): ?string
{
function readLineFromStream(string $str)
{
$stream = fopen('php://memory', 'r+');
fwrite($stream, $str);
rewind($stream);
while (($line = fgets($stream)) !== false) {
yield($line);
}
}
foreach (readLineFromStream($html) as $line) {
if (strpos(strtolower(trim($line)), '<meta name="robots"') === 0) {
return $line;
}
}
return null;
}
@willemwollebrants damn that's clever code, hadn't thought of that yet!
Decided to fix this in the implementation of RobotsMeta instead of the actual class itself, will close for now.
Dear contributor,
because this issue seems to be inactive for quite some time now, I've automatically closed it. If you feel this issue deserves some attention from my human colleagues feel free to reopen it.
An interesting thing happens when you read very large contents of HTML and apply the
findRobotsMetaTagLine($html)
method: it runs out of memory.The problem occurs here:
But the real issue is the
$html
variable, which might be several hundred thousand lines long.My initial reaction was: I'll just read that string in chunks. There's fread for files, but not for strings.
What's the safest way to read a string in chunks to avoid out of memory errors?
(I really want to avoid looping a string with
$html[0]
,$html[1]
, ... )