Closed touren closed 7 years ago
You need to set request's jar
parameter to true
to enable cookies:
const scraper = require('html-metadata')
scraper({
url: 'http://www.nytimes.com/2017/04/07/world/middleeast/syria-attack-trump.html',
jar: true
}, (error, metadata) => {
// do something here
})
Did @achingbrain 's suggestion work for you?
We let users set their own options objects as some people want to use a new cookie jar every request or want to use the same one, etc. see docs under "options". Basically we just pass the options object on to the request library: https://github.com/request/request#requestoptions-callback
Closing, feel free to reopen if you still have issues :).
It works. Thank you guys.
Hi, I try to parse the page: http://www.nytimes.com/2017/04/07/world/middleeast/syria-attack-trump.html Got some error: (node:68496) Warning: Possible EventEmitter memory leak detected. 11 pipe listeners added. Use emitter.setMaxListeners() to increase limit (node:68496) Warning: Possible EventEmitter memory leak detected. 11 pipe listeners added. Use emitter.setMaxListeners() to increase limit Unhandled rejection Error: Exceeded maxRedirects. Probably stuck in a redirect loop https://www.nytimes.com/glogin?URI=https%3A%2F%2Fwww.nytimes.com%2F2017%2F04%2F07%2Fworld%2Fmiddleeast%2Fsyria-attack-trump.html%3F_r%3D4 at Redirect.onResponse (/Users/Tao/Work/Ludlow/www/ludlow-web/node_modules/request/lib/redirect.js:98:27) at Request.onRequestResponse (/Users/Tao/Work/Ludlow/www/ludlow-web/node_modules/request/request.js:917:22) at emitOne (events.js:96:13) at ClientRequest.emit (events.js:188:7) at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:474:21) at HTTPParser.parserOnHeadersComplete (_http_common.js:99:23) at TLSSocket.socketOnData (_http_client.js:363:20) at emitOne (events.js:96:13) at TLSSocket.emit (events.js:188:7) at readableAddChunk (_stream_readable.js:176:18) at TLSSocket.Readable.push (_stream_readable.js:134:10) at TLSWrap.onread (net.js:548:20)
Probably need to set some cookies to break the redirect loop.