microlinkhq / metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
https://metascraper.js.org
MIT License
2.35k stars 168 forks source link

`author` is not recognized for YouTube #706

Closed eugeny-dementev closed 6 months ago

eugeny-dementev commented 6 months ago

Prerequisites

Subject of the issue

author property is null even though it's present in json-ld

Steps to reproduce

I'm using these set of rules for extracting metadata

const metascraper = require('metascraper')([
  require('metascraper-author')(),
  require('metascraper-date')(),
  require('metascraper-description')(),
  require('metascraper-publisher')(),
  require('metascraper-title')(),
  require('metascraper-url')()
])

For all websites I've checked (not that many though) it works and author is extracted. But for YouTube in particular it's empty

If to check meta on the page in DevTools. It's present

JSON.parse(document.querySelectorAll('script[type="application/ld+json"]')[0].innerHTML)

Sometimes in

JSON.parse(document.querySelectorAll('script[type="application/ld+json"]')[1].innerHTML)

But always with author property being the channel of the video

{
    "@context": "https://schema.org",
    "@type": "VideoObject",
    "author": "NEVER TOO SMALL",
    "name": "NEVER TOO SMALL: Movie Director’s Micro Loft Apartment, Philippines 24sqm/258sqft",
    "description": "Grab a copy of our second book at ...",
    "duration": "PT488S",
    "embedUrl": "https://www.youtube.com/embed/ldb4kCDkRY4?start=98"
}

Expected behaviour

author property extracted from json-ld schema

Actual behaviour

author property is null

eugeny-dementev commented 6 months ago

Figured that for YouTube there is a a separate rule because of it's specifics https://www.npmjs.com/package/metascraper-youtube