Closed pdehaan closed 2 years ago
Spotted https://www.youtube.com/channel/UCXNqkD43iJYHX6hwBXam3jg, which has some seriously messed up <meta name="keywords" ...>
, but better open graph tags. Not sure if there is anything we can do, apart from support open graph:
<meta name="keywords" content=""double dream hands" "john jacobson" music dance fun "sprint guy"">
...
<meta property="og:video:tag" content="double dream hands">
<meta property="og:video:tag" content="john jacobson">
<meta property="og:video:tag" content="music">
<meta property="og:video:tag" content="dance">
<meta property="og:video:tag" content="fun">
<meta property="og:video:tag" content="sprint guy">
And via our Metadata parser:
$ http http://localhost:7001/v1/metadata urls:='["https://www.youtube.com/channel/UCXNqkD43iJYHX6hwBXam3jg"]' -j -v
POST /v1/metadata HTTP/1.1
Accept: application/json
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 70
Content-Type: application/json; charset=utf-8
Host: localhost:7001
User-Agent: HTTPie/0.9.1
{
"urls": [
"https://www.youtube.com/channel/UCXNqkD43iJYHX6hwBXam3jg"
]
}
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 636
Content-Type: application/json; charset=utf-8
Date: Tue, 23 Aug 2016 22:25:16 GMT
ETag: W/"27c-8YtcCl1pW4ut1hZuBN3QHw"
{
"request_error": "",
"url_errors": {},
"urls": {
"https://www.youtube.com/channel/UCXNqkD43iJYHX6hwBXam3jg": {
"description": "All video John Jacobson!",
"favicon_url": "https://www.youtube.com/yts/img/favicon_32-vfl8NGn4k.png",
"images": [
{
"entropy": 1,
"height": 500,
"url": "https://yt3.ggpht.com/-m1dhEVB67Kg/AAAAAAAAAAI/AAAAAAAAAAA/nSzR87de3G8/s900-c-k-no-mo-rj-c0xffffff/photo.jpg",
"width": 500
}
],
"keywords": "\"double dream hands\" \"john jacobson\" music dance fun \"sprint guy\"",
"original_url": "https://www.youtube.com/channel/UCXNqkD43iJYHX6hwBXam3jg",
"title": "John Jacobson",
"type": "profile",
"url": "https://www.youtube.com/user/JhnJacobson"
}
}
}
Sadly, it looks like Embedly fails hard on the keywords[]
and just returns an empty Array:
$ http https://embedly-proxy.services.mozilla.com/v2/extract urls:='["https://www.youtube.com/channel/UCXNqkD43iJYHX6hwBXam3jg"]' -j -v
...
"keywords": [],
http://embed.ly/docs/explore/extract?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2FJhnJacobson
We added basic support for meta
keywords
in https://github.com/mozilla/page-metadata-parser/pull/43 but we still need to tweak the page-metadata-service to support keywords as well (which maybe we need to tweak so we get upstream parser results by default instead of having to explicitly opt-in — unless that is stupid).Fixing appears superficially simple:
Ref: https://github.com/mozilla/page-metadata-parser/issues/47; "Extended support for page keywords" Ref: https://github.com/mozilla/page-metadata-parser/issues/48; "Return keywords as array instead of string?"