microlinkhq / metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
https://metascraper.js.org
MIT License
2.35k stars 168 forks source link

IMDB links not working #603

Closed durdevic closed 1 year ago

durdevic commented 2 years ago

Prerequisites

Subject of the issue

IMDB links not working. Getting 403 Forbidden

Steps to reproduce

Note: You can reproduce the code using interactive Node.js shell by Runkit.

Expected behaviour

Not to be forbidden.

Actual behaviour

Getting this response

meta {
  author: null,
  date: 2022-12-03T14:06:56.528Z,
  description: undefined,
  image: undefined,
  logo: 'https://www.imdb.com/favicon.ico',
  publisher: undefined,
  title: '403 Forbidden',
  url: 'https://www.imdb.com/title/tt8772296/'
}
Kikobeats commented 1 year ago

Hello,

It looks a problem related to get the HTML behind the target URL, not a problem of extracting the content from the markup.

Check how Microlink API (essentially metascraper as service) is resolving it:

https://api.microlink.io/?url=https://www.imdb.com/title/tt8772296

durdevic commented 1 year ago

Hey, thanks a lot for the very fast reply, appreciated!

I understand your message. I have been struggling to see what's the issue, and seems that axios is getting a problem with getting the data.

Do you maybe have an idea what I could do because it just suddenly started throwing this error, there was no update to the source code? It simply started throwing the error and I'm lost in the loop now 🤷‍♂️

Kikobeats commented 1 year ago

no idea about axios. Try to use https://github.com/microlinkhq/html-get

durdevic commented 1 year ago

I'll give it a shot, thanks for the insanely fast reply! 🥇