Closed grandeljay closed 2 years ago
Hi. Canonical URL is not always a reliable value. I had too many issues from sites that returns the same canonical url for all pages, or goes to a different page. This is why it's not used.
I had too many issues from sites that returns the same canonical url for all pages, or goes to a different page.
Wouldn't that be the site owners problem? If they mark a URL as canonical shouldn't that be used? Even if it seems wrong to us?
Yes, if all developers made everything right and standard, it would be more easy but sadly it's not happening :)
Even Google can ignore the canonical
value on index a website for various reasons.
To include the canonical url or any other desired value, you can extend the Url detector.
Oh, neat. Thanks!
Is there an easy way to just return null
or something empty if the canonical URL is not set? I'm not understanding how this works :/
public function detect(): UriInterface
{
$oembed = $this->extractor->getOEmbed();
return $oembed->url('url') ?? new Url('');
}
The url cannot be null
. What you can do is check the canonical url:
public function detect(): UriInterface
{
$oembed = $this->extractor->getOEmbed();
return $oembed->url('url')
?: $oembed->url('web_page')
?: $this->getCanonical()
?: $this->extractor->getUri();
}
protected function getCanonical(): ?UriInterface
{
$document = $this->extractor->getDocument();
foreach ($document->select('.//link[@canonical]')->nodes() as $node) {
$href = $node->getAttribute('href');
if ($href) {
return $this->extractor->resolveUri($href);
}
}
}
(Not tested)
This is perfect, thank you so much!
Just in case anybody else is interested, here is my slightly revised version (with a return null
statement) I am using:
public function detect(): UriInterface
{
$oembed = $this->extractor->getOEmbed();
return $oembed->url('url')
?: $oembed->url('web_page')
?: $this->getCanonical()
?: $this->extractor->getUri();
}
protected function getCanonical(): ?UriInterface
{
$document = $this->extractor->getDocument();
foreach ($document->select('.//link[@canonical]')->nodes() as $node) {
$href = $node->getAttribute('href');
if ($href) {
return $this->extractor->resolveUri($href);
}
}
return null;
}
The full file is here
@oscarotero if you're willing to consider it, I'd open a separate issue for it, but I would love to have an indicator/property, etc to determine if the sites canonical tag was found/used or not.
Is that possible?
You can create a different detector instead of extending the current one. This allows to have a custom property, like $info->canonical
.
Please excuse me if I am using this wrong but the canonical URL seems to be wrong here:
https://oscarotero.com/embed/demo/index.php?url=https%3A%2F%2Fwww.etsy.com%2Fde%2Flisting%2F653731908%2Fvalentinstag-geschenk-fur-ihn-holzuhr%3Fclick_key%3D49&settings=
It's showing the URL I used instead of the canonical URL from the source or even an empty value, which I would prefer. Is it maybe a feature and defaulting to my URL instead of an empty value?