laurengarcia / url-metadata

NPM module: Request a url and scrape the metadata from its HTML using Node.js or the browser.
https://www.npmjs.com/package/url-metadata
MIT License
166 stars 43 forks source link

404 returned for existing url after upgrading to version 3.3.1 #69

Closed thelengyu closed 7 months ago

thelengyu commented 8 months ago
#!/usr/bin/env node

const urlMetadata = require('url-metadata');

(async function () {
  try {
    const url = 'https://www.skynews.com.au/world-news/united-states/joe-biden-backs-defense-secretary-despite-lack-of-transparency-on-hospitalisation/video/442a6796cce06e13ce9b8658a5add27a';
    const metadata = await urlMetadata(url, { mode: 'same-origin' });
    console.log('fetched metadata:', metadata)
  } catch(err) {
    console.log('fetch error:', err);
  }
})();

Take the url in the cod for example

https://www.skynews.com.au/world-news/united-states/joe-biden-backs-defense-secretary-despite-lack-of-transparency-on-hospitalisation/video/442a6796cce06e13ce9b8658a5add27a

After upgrading to the latest version(3.3.1), it always returns 404.

laurengarcia commented 7 months ago

The tests are all passing for this package, i believe your query returns 404 bc news sites are blocking scrapers such as this package due to concerns around AI/ machine learning training on this type of content without license.