rchipka / node-osmosis

Web scraper for NodeJS
4.12k stars 246 forks source link

URL encoding (follow method) - ERR_UNESCAPED_CHARACTERS #232

Open andriibieriezhnoi opened 5 years ago

andriibieriezhnoi commented 5 years ago

Hello, I got this error while using .follow() method.

_http_client.js:115
      throw new ERR_UNESCAPED_CHARACTERS('Request path');
      ^

TypeError [ERR_UNESCAPED_CHARACTERS]: Request path contains unescaped characters
    at new ClientRequest (_http_client.js:115:13)
    at Object.request (http.js:41:10)
    at Needle.send_request (/Users/andrey/Projects/kankakeecountyed/scripts/node_modules/needle/lib/needle.js:465:26)
    at next (/Users/andrey/Projects/kankakeecountyed/scripts/node_modules/needle/lib/needle.js:361:10)
    at Needle.start (/Users/andrey/Projects/kankakeecountyed/scripts/node_modules/needle/lib/needle.js:364:17)
    at Object.exports.request (/Users/andrey/Projects/kankakeecountyed/scripts/node_modules/needle/lib/needle.js:746:56)
    at Request (/Users/andrey/Projects/kankakeecountyed/scripts/node_modules/osmosis/lib/Request.js:15:19)
    at Osmosis.request (/Users/andrey/Projects/kankakeecountyed/scripts/node_modules/osmosis/index.js:187:5)
    at Osmosis.dequeueRequest (/Users/andrey/Projects/kankakeecountyed/scripts/node_modules/osmosis/index.js:269:10)
    at /Users/andrey/Projects/kankakeecountyed/scripts/node_modules/osmosis/index.js:223:22

My config:

osmosis
  .get(SOURCE_URL)
  .find('.newsSummaryItem')
  .set({
    title: '.newsTitle',
    date: '.newsDate',
    excerpt: '.newsSummary',
  })
  .follow('.readMore@href')
  .set({
    content: '.newsSummary',
  })
  .data(data => savedNews.push(data))
  .log(console.log)
  .error(console.log)
  .debug(console.log)
  .done(() => {
    fs.writeFile('news.json', JSON.stringify(savedNews, null, 4), (err) => {
      if (err) {
        console.log(err);
      } else {
        console.log(`Data saved to news.json file.\nNews count: ${savedNews.length}`);
      }
    });
  });
BitFros7y commented 5 years ago

I think SOURCE_URL has unescaped characters, please paste it here.

andriibieriezhnoi commented 5 years ago

@NegativeIQ http://kankakeecountyed.org/about-us/news-and-updates.aspx

its works on nodejs 9.7.1, but I have this error on node 10.14.* and 10.15

andriibieriezhnoi commented 5 years ago

@NegativeIQ https://repl.it/@andreyberezhnoy/news-scrapping

repl.it use node 9.7.1

andykov commented 5 years ago

Did you manage to solve the problem? I ran into the same.