jshemas / openGraphScraper

Node.js scraper service for Open Graph Info and More!
MIT License
669 stars 105 forks source link

FEAT: Limit the download size of the fetched page #136

Closed greenbull-tpathier closed 2 years ago

greenbull-tpathier commented 2 years ago

The request timeout would limit damages if we're passed the URL of a large file but we don't want to run out of memory if too much content gets transferred.

Test case:

const a = require('open-graph-scraper');
a({url: 'https://releases.ubuntu.com/20.04.3/ubuntu-20.04.3-desktop-amd64.iso', timeout: 60000}).then(r => b=r, console.log)

Memory use for Node.JS will reach something like a Gigabyte. I think it's OK to avoid fetching more than 1MB of content.

I'm not sure about testing. We could create a local server throwing an error if too much content is consumed?

jshemas commented 2 years ago

Hello! Thank you for this pull request, I think this is a great idea. I can add some tests for this and publish it later this week,

jshemas commented 2 years ago

This is now live in open-graph-scraper@4.10.0. Thanks again!

TPXP commented 2 years ago

Thanks for your prompt reply! <3