jshemas / openGraphScraper

Node.js scraper service for Open Graph Info and More!
MIT License
669 stars 105 forks source link

Doesn't seem to handle Amazon partner link redirects? #61

Closed michaelforrest closed 6 years ago

michaelforrest commented 6 years ago

When I scrape this Amazon url: https://amzn.to/2Is8sCR, I expect to see the same results as the Facebook debugger (here) which follows a redirect to here, but for some reason I always get this:

{
  ogDescription: 'Buy The Anatomy of Story: 22 Steps to Becoming a Master Storyteller Reprint by John Truby (ISBN: 8601200418156) from Amazon\'s Book Store. Everyday low prices and free delivery on eligible orders.',
  ogImage: {
    url: 'https://images-eu.ssl-images-amazon.com/images/G/02/gno/sprites/nav-sprite-global_bluebeacon-V3-1x_optimized._CB516557022_.png'
  },
  ogTitle: 'The Anatomy of Story: 22 Steps to Becoming a Master Storyteller: Amazon.co.uk: John Truby: 8601200418156: Books'

}

Here's what I see

seen

Here's what I want

wanted

Thanks, hope somebody can help!

jshemas commented 6 years ago

Hello! Thanks for opening this issue.

In open-graph-scraper@3.2.0 I updated the module to send back an array of all of the images on page if there isn't any open graph images.

michaelforrest commented 6 years ago

@jshemas thanks, I think that’s definitely a good addition. Have you checked it against my example though? I think something more complicated might be happening here - if I view source on the page the scraper sees it doesn’t have the open graph tags yet so I don’t know what the Facebook scraper is doing differently?

jshemas commented 6 years ago

Hello, i'm not able to use the Facebook debugger since I don't have a facebook account. Facebook is probably doing a lot more then just scraping open graph info. What other information would you like OGS to pull back?

I can probably do a one off thing to get amazon reviews, if that is what you need.

michaelforrest commented 6 years ago

Hi @jshemas, again thanks for responding. It's not the reviews bit per se (that just seems to be something Amazon adds into its thumbnails) it's more just figuring out what's weird about how / when Amazon exposes og tags for its products. Seems weird what comes back from curl even.

michaelforrest commented 6 years ago

It might not even be exposing og tags! But I'm sure I saw them on at least one product page!

michaelforrest commented 6 years ago

No it does seem that Amazon doesn't expose opengraph tags after all, so Facebook must be making an exception to get this (maybe hitting an Amazon API). This all seems beyond the scope of open-graph-scraper so I'm happy to close this issue.

TylerAHolden commented 4 years ago

In case anyone else needs this, solution for amazon found here: https://github.com/jhy/jsoup/issues/976 just need to set User-Agent

t-lochhead commented 3 years ago

if you're a noob like me, here is the full answer further to @TylerAHolden 's guidance

const ogs = require("open-graph-scraper");
const options = {
  url: "https://amzn.to/2Is8sCR",
  headers: {
    "user-agent": "Googlebot/2.1 (+http://www.google.com/bot.html)",
  },
};
ogs(options, (error, results, response) => {
  console.log("error:", error); // This is returns true or false. True if there was a error. The error it self is inside the results object.
  console.log("results:", results); // This contains all of the Open Graph results
  console.log("response:", response); // This contains the HTML of page
});