jshemas / openGraphScraper

Node.js scraper service for Open Graph Info and More!
MIT License
643 stars 102 forks source link

download limit exceeded, is there a way to not request the html response as well? onlyGetOpenGraphInfo not working #150

Closed flatypus closed 1 year ago

flatypus commented 1 year ago

I tried scraping reddit.com but found that without setting the downloadLimit to like 10Mb, it won't let me run it, but it seems that the majority of the response is just the html response. Is there a way to only request the results data? I tried {onlyGetOpenGraphInfo:true} in options, but that didn't seem to work.

jshemas commented 1 year ago

Hello. I can add a flag for returning the html response.

Note: The downloadLimit is option for the main HTTP call made by GOT and you will exceed the limit on larger pages. (The idea is to stop users from downloading large files) Adding the flag for html response won't fix this problem.

flatypus commented 1 year ago

Ah. I was just trying to get the graph info data for reddit, but i didn't know why it kept exceeding the limit even when I only needed an image and some text.

jshemas commented 1 year ago

Can you give me a example page?

ogs({ url: 'https://www.reddit.com/' })
      .then(function ({ result }) {
        console.log('result:', result);
      });

Will output:

result: {
  ogSiteName: 'reddit',
  twitterSite: '@reddit',
  twitterCard: 'summary',
  ogTitle: 'reddit',
  twitterTitle: 'reddit',
  ogType: 'website',
  ogUrl: 'https://www.reddit.com/',
  ogImage: {
    url: 'https://www.redditstatic.com/icon.png',
    width: '256',
    height: '256',
    type: 'png'
  },
  twitterImage: {
    url: 'https://www.redditstatic.com/icon.png',
    width: null,
    height: null,
    alt: null
  },
  ogDescription: "Reddit is a network of communities where people can dive into their interests, hobbies and passions. There's a community for whatever you're interested in on Reddit.",
  ogLocale: 'en-US',
  favicon: 'https://www.redditstatic.com/desktop2x/img/favicon/android-icon-192x192.png',
  charset: 'utf8',
  requestUrl: 'https://www.reddit.com/',
  success: true
}

You can also bypass the downloadLimit by setting it to false.

ogs({ url: 'https://www.reddit.com/', downloadLimit: false })
flatypus commented 1 year ago

Oh! I didn't know about the downloadLimit: false setting. Thanks!

Murkrage commented 1 year ago

@jshemas false isn't an accepted type since it's only allowing string | undefined. Is this an error on the types end or is a boolean value just not an actual type we can use?

jshemas commented 1 year ago

@Murkrage I've updated the type to allow false in open-graph-scraper@5.0.4