jshemas / openGraphScraper

Node.js scraper service for Open Graph Info and More!
MIT License
647 stars 103 forks source link

Page not found with any user agent #120

Closed vrizo closed 3 years ago

vrizo commented 3 years ago

Hi, @jshemas !

Thank you for the amazing tool! We use it for several years, and it works perfectly.

We’ve faced a problem with this url recently: https://leroymerlin.ru/offer/industrialnyy-stil-v-interere/.

OGS returns the following:

data:  {
  error: true,
  result: {
    success: false,
    requestUrl: 'http://leroymerlin.ru/offer/industrialnyy-stil-v-interere/',
    error: 'Page not found',
    errorDetails: Error: Page not found
        at setOptionsAndReturnOpenGraphResults (…/node_modules/open-graph-scraper/lib/openGraphScraper.js:69:13)
        at processTicksAndRejections (internal/process/task_queues.js:97:5)
        at async …/node_modules/open-graph-scraper/index.js:29:17
  }
}

I’ve tried to delete or replace user agent according to this issue, but it doesn’t help. Also I’ve tried to get the page using wget and curl — it works. Also, Facebook Debugger gets the page correctly.

Could you help me please with debugging of this URL?

Thanks!

jshemas commented 3 years ago

Hello,

It looks like leroymerlin.ru is blocking requests and returns a Response code 403 (Forbidden) errors.

const ogs = require('open-graph-scraper');

const options = {
  url: 'https://leroymerlin.ru/offer/industrialnyy-stil-v-interere/',
  timeout: 10000,
  headers: {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
  },
};

ogs(options)
  .then((data) => {
    const { error, result } = data;
    console.log('error:', error); // This is returns true or false. True if there was a error. The error it self is inside the results object.
    console.log('result:', result); // This contains all of the Open Graph results
  });

Using the above code works and returns:

error: false
result: {
  ogTitle: 'Индустриальный стиль в интерьере в Москве и России – выгодные предложения в интернет-магазине Леруа Мерлен',
  ogDescription: 'Предложение от Леруа Мерлен в Москве и России: индустриальный стиль в интерьере – спешите купить по низким ценам в интернет-магазине Москвы и России.',
  ogLocale: 'ru',
  ogLogo: '/etc/designs/elbrus/images/logo.svg',
  ogUrl: 'https://leroymerlin.ru/offer/industrialnyy-stil-v-interere/',
  charset: 'utf8',
  requestUrl: 'https://leroymerlin.ru/offer/industrialnyy-stil-v-interere/',
  success: true
}
vrizo commented 3 years ago

Wow, thanks a lot, @jshemas ! It works with your User Agent perfectly. Thank you!