Closed JanPetterMG closed 3 years ago
The robots.txt content is always converted to UTF-8, but the mb_* functions expects whatever the user think the encoding is.
mb_*
Results in valid UTF-8 robots.txt files being parsed as the wrong encoding, witch further causes loss of valid rules.
In other words, it's like Russian roulette if perfectly valid rules are parsed correctly...
mb_internal_encoding("utf-8"); new RobotsTxtParser('', "iso-8859-1"); var_dump(mb_internal_encoding()); // string(10) "ISO-8859-1"
The robots.txt content is always converted to UTF-8, but the
mb_*
functions expects whatever the user think the encoding is.Results in valid UTF-8 robots.txt files being parsed as the wrong encoding, witch further causes loss of valid rules.
In other words, it's like Russian roulette if perfectly valid rules are parsed correctly...