Closed youradds closed 3 years ago
Hi @youradds. I don't see your write_file()
function in there. Is the output already corrupted if you just print it to the terminal?
Hi @oalders write_file()
is coming from File::Slurp::Unicode.
I managed to find a bit of a dirty work around with;
`curl -o "$CFG->{admin_root_path}/tmp/$_->{domain}.txt" -L -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; pl; rv:1.9; PageThing http://pagething.com) Gecko/2008052906 Firefox/3.0" --compressed --silent --max-time 10 --location --connect-timeout 10 '$_->{domain}'`;
if (-e "$CFG->{admin_root_path}/tmp/$_->{domain}.txt") {
$page = read_file("$CFG->{admin_root_path}/tmp/$_->{domain}.txt");
unlink("$CFG->{admin_root_path}/tmp/$_->{domain}.txt");
}
So basically save it directly from curl (so the encoding is saved), and then read the file. That seems to then render ok after. My guess is that $page needed some kind of encoding to work properly when slurped in from the backtick command. The issue is that we don't know the encoding of the pages so its tricky to decode/encode that variable
Cheers
Andy
Hi Andy,
I'm glad to see you found a solution. I'll close this issue for now. Let me know if you need to re-open it.
Best,
Olaf
Hi,
I seem to be having some issues with UTF-8 and HTML::Restrict. So this is a test case:
If I run that, the output is corrupted:
Yet if I comment out
$page = $hr->process($page);
, it works (but obviously hasn't removed the html code I don't want)Is there any way around this?
Thanks