Closed Dan0sz closed 2 years ago
I've just tested this on PHP 8.0 and it works fine (no conversion/encoding). Send me a sample code to test, if you want.
Sorry, didn't get a notification about your reply.
I'm not sure what kind of sample code you're looking for, but it's occurring on e.g. this page: https://testwp.uplab.gr/%CE%B5%CE%BD%CF%84%CE%BF%CE%BC%CE%BF%CE%B1%CF%80%CF%89%CE%B8%CE%B7%CF%84%CE%B9%CE%BA%CE%AC/esquito-lotion/
I suppose you could copy the HTML of this page and test again?
When inspecting the middle tab underneath the mian product image, you'll notice the encoding:
Hi, haven't heard back from you. Could it have something to do with the fact that it's a a href
element?
Yep, looks like a href
behaves differently than the other attributes (a class
for example). My library uses internally the native DomDocument library, and the latter is the one that makes this change.
Do you know why this might happen?
Hi,
Thanks for your help. I've resolved the issue by filtering the HTML before returning it. Basically, by making a regex that captures urls and running urldecode
on each of those URLs before returning it.
preg_match_all('/href=[\'"](?<url>.*?)[\'"]/', $html, $urls);
if (!isset($urls['url'])) {
return $html;
}
foreach ($urls['url'] as $url) {
if (strpos($url, '%') === false) { // Exact match, because otherwise position 0 will return true as well.
continue;
}
$search[] = $url;
$replace[] = urldecode($url);
}
return str_replace($search, $replace, $html);
I'll close this. :-)
Thank you for this solution!
In my case the link starts directly with an umlaut:
<a href="%C3%96kologie.html" title="Wechselbeziehungen zwischen Lebewesen und ihrer Umwelt">Ökologie - Wechselbeziehungen</a>
(strpos($url, '%') == false)
is true.
I change that to
if (strpos($url, '%') > -1) {
$search[] = $url;
$replace[] = urldecode ($url);
}
When using this library on a site written in e.g. Greek, it seems to convert all Greek characters to Unicode (I think) in the source. Is there a way to prevent this from happening?
E.g.
#tab-σημαντικές-πληροφορίες
is converted to#tab-%CF%83%CE%B7%CE%BC%CE%B1%CE%BD%CF%84%CE%B9%CE%BA%CE%AD%CF%82-%CF%80%CE%BB%CE%B7%CF%81%CE%BF%CF%86%CE%BF%CF%81%CE%AF%CE%B5%CF%82
It might also be a PHP configuration issue? But I'm not sure.
Any assistance would be appreciated!