ruipgil / scraperjs

A complete and versatile web scraper.
MIT License
3.71k stars 188 forks source link

Exceeded maxRedirects. Probably stuck in a redirect loop #15

Closed farezv closed 10 years ago

farezv commented 10 years ago

So I'm trying to scrape a url which seems to keep redirecting only to result in the following error. This is a well documented issue in the request module and I tried to create a scraperPromise.request(options) promise with an options object with followAllRedirects = false but that just returns a scraperPromise that's an [object Object] when I print it to console.

Here's the relevant stack trace

Error: Exceeded maxRedirects. Probably stuck in a redirect loop http://ubc.summon.serialssolutions.com/search?s.cmd=addFacetValueFilters%28ContentType%2CNewspaper+Article%3At%29&spellcheck=true&s.q=macbeth
    at Request.onResponse (/Users/.../node_modules/scraperjs/node_modules/request/request.js:901:26)
    at ClientRequest.g (events.js:180:16)
    at ClientRequest.emit (events.js:95:17)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (http.js:1688:21)
    at HTTPParser.parserOnHeadersComplete [as onHeadersComplete] (http.js:121:23)
    at Socket.socketOnData [as ondata] (http.js:1583:20)
    at TCP.onread (net.js:527:27)

Here's how I'm trying to use the request. Not sure how I can proceed further since that console.log never executes. Am I doing something wrong? Any help is appreciated!

var scraperPromise = scraperjs.StaticScraper.create();

        scraperPromise.request(options, function (error, response) {
            if (error) {
              callback(error, null);
            } else {
              callback(null, response.request.href);
              console.log(response);
            }
        });
ruipgil commented 10 years ago

Indeed you are. The request promise receives an object with the options to the request's request method. You should probably do something along this lines,

scraperjs.StaticScraper.create()
  .request(options)
  // stops the promise chain
  .onError(function(err){
    console.log(err);
  })
  //.then(...
  //.scrape(...