microlinkhq / metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
https://metascraper.js.org
MIT License
2.35k stars 168 forks source link

How to debug Metascaper #523

Closed kennethstarkrl closed 2 years ago

kennethstarkrl commented 2 years ago

Prerequisites

Subject of the issue

Some links hang and never return a response.

Is there a way to debug metascaper or give a verbose output? I'm having issues with some links hanging and metascaper doesn't return a response. Some links do work and some do not. On my localhost the non-working links do work so I believe its a networking issue in my production environment somewhere and I'm trying to figure out where the hang-up is.

Thank you.

Kikobeats commented 2 years ago

Hello,

I'm confused because metascraper doesn't perform a network request for getting the HTML; instead, you should to provide it as argument:

const metascraper = require('metascraper')([
  require('metascraper-author')(),
  require('metascraper-date')(),
  require('metascraper-description')(),
  require('metascraper-image')(),
  require('metascraper-logo')(),
  require('metascraper-clearbit')(),
  require('metascraper-publisher')(),
  require('metascraper-title')(),
  require('metascraper-url')()
])

const got = require('got')

const targetUrl = 'http://www.bloomberg.com/news/articles/2016-05-24/as-zenefits-stumbles-gusto-goes-head-on-by-selling-insurance'

;(async () => {
  const { body: html, url } = await got(targetUrl)
  const metadata = await metascraper({ html, url })
  console.log(metadata)
})()

In any case, for debugging, just set DEBUG=metascraper* to see all the internal details.

or just pass the URL to Microlink API that is hosting metascraper under the hood.

.e.g, : api.microlink.io?url=https://teslahunt.io