oscarotero / Embed

Get info from any web service or page
MIT License
2.09k stars 312 forks source link

Provider Name wrongly capitalised for many websites #462

Open diogogithub opened 2 years ago

diogogithub commented 2 years ago
$embed                     = new LibEmbed();
$info                      = $embed->get($url);
$metadata['title']         = $info->title;
$metadata['description']   = $info->description;
$metadata['author_name']   = $info->authorName;
$metadata['author_url']    = (string) $info->authorUrl;
$metadata['provider_name'] = $info->providerName;
$root_url                  = parse_url($url);
$root_url                  = "{$root_url['scheme']}://{$root_url['host']}";
$metadata['provider_url']  = (string) ($info->providerUrl != '' ? $info->providerUrl : $root_url);

With $url = 'https://eportugal.gov.pt/' the providerName is set as Eportugal instead of ePortugal. Even though "Eportugal" is not present anywhere in the website. On the other hand, YouTube is returned as "YouTube". For many other websites it seems that the pattern of capitalising only the first letter and having the rest lower case is present.

oscarotero commented 2 years ago

The capitalised domain name is the fallback when the provider name is not found in the code. You can see here the ProviderName detector code: https://github.com/oscarotero/Embed/blob/master/src/Detectors/ProviderName.php#L15

diogogithub commented 2 years ago

Oh... Is there any setting to set the fallback to something like null representing the absence of value? Otherwise it can be complicated to distinguish between actual findings and this fallback...

oscarotero commented 2 years ago

You can replace any detector, like this one with other with a different behavior. See this example: https://github.com/oscarotero/Embed#detectors