tomas / needle

Nimble, streamable HTTP client for Node.js. With proxy, iconv, cookie, deflate & multipart support.
https://www.npmjs.com/package/needle
MIT License
1.62k stars 235 forks source link

`follow_if_*` options not working (always false) #252

Open elmigranto opened 6 years ago

elmigranto commented 6 years ago

Needle.prototype.should_follow expects to recive original url as parsed value, but actually receives it as a string. This makes follow_if_same_host and follow_if_same_protocol always return false, since they are comparing on the string.

To reproduce, try setting follow_if_same_host to true and query some website that redirects to same host. I had problems with this one:

Code to reproduce ``` js 'use strict'; const needle = require('needle'); const _makeRequest = async () => { return needle('get', `http://www.parkflyer.ru/`, { compressed: true, accept: 'text/html', open_timeout: 10000, response_timeout: 10000, read_timeout: 10000, follow_max: 5, rejectUnauthorized: false, follow_if_same_host: true, }); } _makeRequest() .then(({statusCode, body}) => console.log({statusCode})) .catch((err) => console.log({err})); ```
Output of the above ``` { statusCode: 302 } ```
Things I printed from within `function matches(property)` inside `send_request` ``` { property: 'host', original: 'http://www.parkflyer.ru/', location: 'http://www.parkflyer.ru/ru/', originalProperty: undefined } ```
Same site, but with `curl` following redirects ``` $ curl -LI http://www.parkflyer.ru/ HTTP/1.1 302 FOUND Server: nginx Date: Sat, 11 Aug 2018 08:58:05 GMT Content-Type: text/html; charset=utf-8 Connection: keep-alive Keep-Alive: timeout=20 Vary: Cookie Location: http://www.parkflyer.ru/ru/ Set-Cookie: sessionid=lsp455s7plixmol4yarfi1spu6j477va; expires=Sat, 25-Aug-2018 08:58:05 GMT; httponly; Max-Age=1209600; Path=/ HTTP/1.1 200 OK Server: nginx Date: Sat, 11 Aug 2018 08:58:05 GMT Content-Type: text/html; charset=utf-8 Connection: keep-alive Keep-Alive: timeout=20 Vary: Cookie Content-Language: ru Set-Cookie: vtime="2018-08-11T11:58:05.718622"; Path=/ Set-Cookie: country_from_ip=RU; Path=/ Set-Cookie: referrer=; expires=Tue, 08-Aug-2028 08:58:05 GMT; Max-Age=315360000; Path=/ Set-Cookie: landing="http://www.parkflyer.ru/ru/"; Path=/ Set-Cookie: nv_event_id=None; expires=Tue, 08-Aug-2028 08:58:05 GMT; Max-Age=315360000; Path=/ Set-Cookie: currency=RUR; Path=/ Set-Cookie: sessionid=9b4vys6u4vfdlz5swrpr8jssxkz9npvv; expires=Sat, 25-Aug-2018 08:58:05 GMT; httponly; Max-Age=1209600; Path=/ Set-Cookie: refurl=; Path=/ Set-Cookie: csrftoken=Tnjfvk345BORT3zM1yCpKWzTDQg2LyGS; expires=Sat, 10-Aug-2019 08:58:05 GMT; Max-Age=31449600; Path=/ Set-Cookie: news=%23articles_all; Path=/ Set-Cookie: stuff_ive_seen=%7B%22product%22%3A%203127985%2C%20%22blog_entry_article%22%3A%2015834%2C%20%22blog_entry_vio%22%3A%2015830%2C%20%22comment_product%22%3A%2026790371%2C%20%22blog_entry_review%22%3A%200%2C%20%22blog_entry_news%22%3A%2015531%7D; expires=Sun, 11-Aug-2019 08:58:05 GMT; Max-Age=31536000; Path=/ ```

This comparse as if original was parsed output of, say, url.parse, which has host and protocol properties (see line 409):

https://github.com/tomas/needle/blob/9374dcc03e4a2499ae08de77aff289a43918e70b/lib/needle.js#L404-L411


Here's an invokation which passes uri as is:

https://github.com/tomas/needle/blob/9374dcc03e4a2499ae08de77aff289a43918e70b/lib/needle.js#L494

uri is an argument to Needle.prototype.send_request:

https://github.com/tomas/needle/blob/9374dcc03e4a2499ae08de77aff289a43918e70b/lib/needle.js#L426

… which seems to always be a string

elmigranto commented 6 years ago

I am running needle@2.2.2 and here're my process.versions if that helps or needed:

{ http_parser: '2.8.0',
  node: '8.11.3',
  v8: '6.2.414.54',
  uv: '1.19.1',
  zlib: '1.2.11',
  ares: '1.10.1-DEV',
  modules: '57',
  nghttp2: '1.32.0',
  napi: '3',
  openssl: '1.0.2o',
  icu: '60.1',
  unicode: '10.0',
  cldr: '32.0',
  tz: '2017c' }
tomas commented 6 years ago

Thanks for the very detailed bug report! This seems to be partially related to #251.

Are you up for submitting a PR?

elmigranto commented 6 years ago

Yeah, no problem, thanks for the neat lib, Tomás ❤️

251 is about redirects as well, but asks to add some functionality and this one is an actual bug. Guess we can tackle them together, up to you.

Though, not sure if or when I'll be having time for a proper PR, I've ultimately decied to write my own wrapper around http module, since the needs on the project are a bit more involved — we need a bunch of async hooks for different request stages and stuff like byte limits, etc. But here's my diff with how I initially fixed this (tldr: should_follow does url.parse on its location and original arguments — you can hoist these in the scope, rename function args and use parsed versions intstead):

diff --git a/lib/needle.js b/lib/needle.js
index 24dea06..6c1e404 100644
--- a/lib/needle.js
+++ b/lib/needle.js
@@ -412,8 +412,17 @@ Needle.prototype.should_follow = function(location, config, original) {
   if (location === original)
     return false;

-  if (config.follow_if_same_host && !matches('host'))
-    return false; // host does not match, so not following
+  // host does not match, so not following
+  if (config.follow_if_same_host) {
+    const originalHostname = typeof original === 'string'
+      ? url.parse(original).hostname
+      : original['host'];
+
+    const locationHostname = url.parse(location).hostname;
+
+    if (locationHostname !== originalHostname)
+      return false;
+  }

   if (config.follow_if_same_protocol && !matches('protocol'))
     return false; // procotol does not match, so not following

@tomas