rchipka / node-osmosis

Web scraper for NodeJS
4.12k stars 245 forks source link

URL Encoding issue #133

Open ImanMh opened 7 years ago

ImanMh commented 7 years ago

If the URL contains none standard characters an exception is thrown. shouldn't it be encoded before setting it as a header?

(follow) url: /v/خیابان-۲۰۶-تقاطع-گیو-و-کادوس/NeX6Aza_V/
_http_outgoing.js:358
    throw new TypeError('The header content contains invalid characters');
    ^

TypeError: The header content contains invalid characters
    at ClientRequest.OutgoingMessage.setHeader (_http_outgoing.js:358:11)
    at new ClientRequest (_http_client.js:85:14)
    at Object.exports.request (http.js:31:10)
    at Object.exports.request (https.js:199:15)
    at Needle.send_request (/Users/Iman/node_modules/needle/lib/needle.js:411:26)
    at Needle.start (/Users/Iman/node_modules/needle/lib/needle.js:318:15)
    at Object.exports.request (/Users/Iman/node_modules/needle/lib/needle.js:670:56)
    at Request (/Users/Iman/node_modules/osmosis/lib/Request.js:16:19)
    at Osmosis.request (/Users/Iman/node_modules/osmosis/index.js:183:5)
    at Osmosis.queueRequest (/Users/Iman/node_modules/osmosis/index.js:250:14)
jorgerosal commented 7 years ago

Hi Iman, not sure what you mean - are you trying to .get or to .follow? if .get, idk about the URL. I tried browsing it in my browser. Can't load it either. if you can try to refine the URL maybe it'll work. or try to use website's IP instead?

if you try to follow it, try to use a selector instead- something like this might work: //following links href that contains "NeX6Aza_V" same to your example above. osmosis.follow('[href^="NeX6Aza_V"]')